Initial implementation of SwiftDBAI

Chat with any SQLite database using natural language. Built on AnyLanguageModel (HuggingFace) for LLM-agnostic provider support and GRDB for SQLite access. Core features: - Auto schema introspection from sqlite_master (zero config) - NL → SQL generation via any AnyLanguageModel provider - Three rendering modes: text summary, data table, Swift Charts - Drop-in DataChatView (SwiftUI) and headless ChatEngine - Operation allowlist with read-only default - Mutation policy with per-table control - ToolExecutionDelegate for destructive operation confirmation - Multi-turn conversation context - 352 tests across 24 suites, all passing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 09:30:56 -05:00
commit b1724fe7ca
55 changed files with 15506 additions and 0 deletions
--- a/PRD.md
+++ b/PRD.md
@@ -0,0 +1,468 @@
+# SwiftDBAI — Product Requirements Document
+
+> **SwiftDBAI** is the umbrella name for AI-powered SQLite database tooling. v1 ships `SwiftDBAI` (chat + SQL engine). Future versions may add `SwiftDBAIMCP` (MCP server mode).
+
+**Version:** 0.2 (Revised — post-pivot from SwiftDataAI)
+**Date:** 2026-04-04
+**Author:** Krishna Kumar
+
+---
+
+## 1. Problem Statement
+
+Developers building apps with SQLite databases have no natural-language interface to query, explore, or mutate their data. Debugging, prototyping, and building AI-powered features all require hand-writing SQL — even for simple questions like "show me all overdue tasks" or "how many users signed up this week."
+
+There is no drop-in Swift package that lets a user (or an LLM) **chat with any SQLite database** using plain English.
+
+---
+
+## 2. Vision
+
+**SwiftDBAI** is a Swift package that gives any SQLite-backed app a conversational interface to its data. Developers embed it in minutes; end users ask questions and get answers from their own data.
+
+The data layer is **all SQL via GRDB** — no SwiftData APIs, no `#Predicate`, no `FetchDescriptor`. SwiftDBAI works with **any SQLite database**, not just SwiftData stores. Schema discovery is automatic via `sqlite_master` introspection — zero configuration required. The developer passes their own GRDB `DatabasePool` or `DatabaseQueue`; SwiftDBAI never manages the connection lifecycle.
+
+Built on [**AnyLanguageModel**](https://github.com/huggingface/AnyLanguageModel) from Hugging Face — a unified Swift LLM abstraction that supports OpenAI, Anthropic, Gemini, Ollama, CoreML, MLX, and llama.cpp through a single API. SwiftDBAI generates SQL from natural language, validates it against a developer-configured operation allowlist, executes it via GRDB, and renders results as text, data tables, or Swift Charts.
+
+---
+
+## 3. Target Users
+
+| Persona | Need |
+|---|---|
+| **iOS/macOS Developer** | Drop-in chat UI + engine to add "talk to your data" features to any SQLite-backed app without building NLP pipelines |
+| **AI/LLM App Builder** | SQL generation layer that lets an LLM read/write any SQLite database through validated, allowlisted operations |
+| **Power User / Debugger** | In-app console to inspect and mutate SQLite data during development |
+
+---
+
+## 4. Goals & Non-Goals
+
+### Goals
+- Natural-language querying of any SQLite database via GRDB
+- **LLM-agnostic** via [AnyLanguageModel](https://github.com/huggingface/AnyLanguageModel) — works with OpenAI, Anthropic, Gemini, Ollama, CoreML, MLX, llama.cpp out of the box
+- Drop-in SwiftUI chat view that "just works" with zero configuration — provide a database path and a model
+- Schema-aware — automatically introspects tables, columns, types, primary keys, foreign keys, and indexes from `sqlite_master`
+- Read **and** write support (SELECT, INSERT, UPDATE, DELETE) with developer-configured operation allowlist and confirmation guards
+- All SQL validation via allowlist check — no SQL parser for safety, no `#Predicate` generation
+- UI rendering: text summaries + scrollable data tables + Swift Charts (bar, line, pie) — all in v1
+- Swift 6 concurrency safe, structured concurrency throughout (Swift 6.1 language mode)
+- Works on iOS 17+, macOS 14+, visionOS 1+
+
+### Non-Goals (v1)
+- ~~Replacing Core Data~~ Not tied to any ORM — works with raw SQLite
+- Building a general-purpose chat framework (data-scoped only)
+- Full SQL parsing for safety (allowlist check is sufficient)
+- Training or fine-tuning models
+- Cloud sync of chat history
+- Managing database connections (developer owns the GRDB connection)
+
+---
+
+## 5. Architecture Overview
+
+```
+┌──────────────────────────────────────────────────────────┐
+│                       SwiftDBAI                          │
+├──────────┬───────────┬──────────────┬────────────────────┤
+│ Chat UI  │  Engine   │ Schema       │  SQL Pipeline      │
+│ (SwiftUI)│           │ Introspector │                    │
+└────┬─────┴─────┬─────┴──────┬───────┴───────┬────────────┘
+     │           │            │               │
+     ▼           ▼            ▼               ▼
+  ChatView   ChatEngine   sqlite_master    SQLQueryParser
+  DataChat   PromptBuilder  PRAGMA         OperationAllowlist
+  View       TextSummary    table_info     MutationPolicy
+             Renderer       foreign_keys   QueryValidator
+                            index_list
+┌──────────────────────────────────────────────────────────┐
+│            Rendering Layer                               │
+├──────────┬──────────────┬────────────────────────────────┤
+│  Text    │  DataTable   │  Swift Charts                  │
+│ Summary  │  (scrollable)│  (Bar, Line, Pie)              │
+└──────────┴──────────────┴────────────────────────────────┘
+┌──────────────────────────────────────────────────────────┐
+│                    GRDB.swift 7.0+                        │
+│              DatabasePool / DatabaseQueue                 │
+└──────────────────────────────────────────────────────────┘
+┌──────────────────────────────────────────────────────────┐
+│              AnyLanguageModel (HuggingFace)              │
+├──────┬──────┬────────┬───────┬───────┬──────┬────────────┤
+│OpenAI│Claude│ Gemini │Ollama │CoreML │ MLX  │ llama.cpp  │
+└──────┴──────┴────────┴───────┴───────┴──────┴────────────┘
+```
+
+### 5.1 Core Modules
+
+| Module | Responsibility |
+|---|---|
+| **SchemaIntrospector** | Queries `sqlite_master`, `PRAGMA table_info`, `PRAGMA foreign_key_list`, and `PRAGMA index_list` to auto-discover all tables, columns (name, type, nullability, defaults), primary keys, foreign keys, and indexes. Produces a `DatabaseSchema` model the LLM uses as context. Zero configuration — no annotations or model definitions needed. |
+| **SQLQueryParser** | Extracts SQL from the raw LLM response, detects the operation type (SELECT/INSERT/UPDATE/DELETE), validates it against the `OperationAllowlist`, enforces `MutationPolicy` table restrictions, and flags destructive operations that require confirmation. |
+| **OperationAllowlist** | Developer-configured set of permitted SQL operations. Presets: `.readOnly` (SELECT only, the default), `.standard` (SELECT + INSERT + UPDATE), `.unrestricted` (all including DELETE). |
+| **MutationPolicy** | Builds on `OperationAllowlist` with per-table restrictions. Controls which mutations are allowed on which tables. DELETE requires confirmation by default. |
+| **QueryValidator** | Extensible protocol for custom pre-execution validation rules (e.g., `TableAllowlistValidator`, `MaxRowLimitValidator`). Developers implement `QueryValidator` to add domain-specific checks. |
+| **ChatEngine** | Orchestrates the full pipeline: schema introspection (once, lazily) -> system prompt with schema context -> LLM generates SQL -> `SQLQueryParser` validates -> GRDB executes -> `TextSummaryRenderer` summarizes -> response. Supports multi-turn conversation with configurable context window. |
+| **PromptBuilder** | Constructs the LLM system prompt including the introspected schema description, allowlist rules, and optional developer-provided context. |
+| **TextSummaryRenderer** | Uses the LLM to generate natural-language summaries of query results. Configurable max rows for summarization. |
+| **ChatView / DataChatView** | Drop-in SwiftUI views. `DataChatView` is the zero-config entry point (database path + model). `ChatView` accepts a `ChatViewModel` for full control. Renders message bubbles, scrollable data tables, Swift Charts (bar/line/pie via `ChartDataDetector`), and error states. |
+
+### 5.2 Data Flow
+
+```
+User types: "Show me all tasks due this week"
+       │
+       ▼
+ChatEngine ensures schema is introspected (via SchemaIntrospector)
+  - Queries sqlite_master, PRAGMA table_info, foreign_key_list, index_list
+  - Caches DatabaseSchema for subsequent queries
+       │
+       ▼
+PromptBuilder constructs system prompt with:
+  - Full schema description (tables, columns, types, keys, indexes)
+  - OperationAllowlist rules
+  - Optional developer context
+  - Conversation history (within context window)
+       │
+       ▼
+LanguageModelSession.respond(to: userMessage)
+  → AnyLanguageModel routes to configured provider (OpenAI / Anthropic / Ollama / ...)
+       │
+       ▼
+LLM returns raw SQL: "SELECT * FROM tasks WHERE dueDate >= date('now', 'weekday 0', '-7 days') ORDER BY dueDate ASC"
+       │
+       ▼
+SQLQueryParser:
+  1. Extracts SQL from LLM response (strips markdown fences, etc.)
+  2. Detects operation type → SELECT
+  3. Validates against OperationAllowlist → allowed
+  4. Checks MutationPolicy table restrictions (if applicable)
+  5. Runs custom QueryValidators
+       │
+       ▼
+GRDB executes SQL via DatabasePool/DatabaseQueue
+  → Returns rows as [[String: Value]] with column names
+       │
+       ▼
+TextSummaryRenderer asks LLM to summarize results in natural language
+ChartDataDetector checks if results are chart-eligible
+       │
+       ▼
+ChatView renders: text summary + scrollable DataTable + Swift Charts (if applicable)
+```
+
+---
+
+## 6. Key APIs (Implemented)
+
+### 6.1 Setup (Minimal — Zero Config)
+
+```swift
+import SwiftDBAI
+import AnyLanguageModel
+
+struct ContentView: View {
+    var body: some View {
+        // Just a database path and a model — that's it
+        DataChatView(
+            databasePath: "/path/to/mydata.sqlite",
+            model: OllamaLanguageModel(model: "llama3")
+        )
+    }
+}
+```
+
+### 6.2 Choosing a Provider (via AnyLanguageModel)
+
+```swift
+import AnyLanguageModel
+
+// OpenAI
+let model = OpenAILanguageModel(apiKey: "sk-...", model: "gpt-4o")
+
+// Anthropic
+let model = AnthropicLanguageModel(apiKey: "sk-ant-...", model: "claude-sonnet-4-20250514")
+
+// Ollama (local)
+let model = OllamaLanguageModel(model: "llama3")
+
+// Gemini
+let model = GeminiLanguageModel(apiKey: "...", model: "gemini-2.0-flash")
+
+// Pass to DataChatView with options
+DataChatView(
+    databasePath: "/path/to/db.sqlite",
+    model: model,
+    allowlist: .standard,
+    additionalContext: "This database stores a recipe app's data."
+)
+```
+
+### 6.3 Bringing Your Own GRDB Connection
+
+```swift
+import GRDB
+import SwiftDBAI
+
+// Developer manages their own connection
+let dbPool = try DatabasePool(path: "/path/to/mydata.sqlite")
+
+// Option A: DataChatView with existing connection
+DataChatView(
+    database: dbPool,
+    model: model,
+    allowlist: .readOnly
+)
+
+// Option B: Headless / programmatic use via ChatEngine
+let engine = ChatEngine(
+    database: dbPool,
+    model: model,
+    allowlist: .standard
+)
+
+let response = try await engine.send("How many tasks are overdue?")
+print(response.summary)     // "You have 12 overdue tasks."
+print(response.sql)         // "SELECT COUNT(*) FROM tasks WHERE dueDate < date('now')"
+print(response.queryResult) // QueryResult with columns, rows, execution time
+```
+
+### 6.4 Schema Introspection (Auto — Zero Config)
+
+```swift
+// Schema is introspected automatically on first query.
+// Or pre-warm it explicitly:
+let schema = try await engine.prepareSchema()
+
+// schema.tableNames → ["tasks", "projects", "users"]
+// schema.tables["tasks"]?.columns → [ColumnSchema(name: "id", type: "INTEGER", isPrimaryKey: true), ...]
+// schema.tables["tasks"]?.foreignKeys → [ForeignKeySchema(fromColumn: "projectId", toTable: "projects", ...)]
+// schema.schemaDescription → Compact text for LLM prompts
+
+// No @Model annotations, no #Predicate, no FetchDescriptor.
+// Just sqlite_master + PRAGMA introspection.
+```
+
+### 6.5 Operation Allowlist (Safety)
+
+```swift
+// Presets
+let readOnly = OperationAllowlist.readOnly          // SELECT only (default)
+let standard = OperationAllowlist.standard          // SELECT + INSERT + UPDATE
+let unrestricted = OperationAllowlist.unrestricted  // All including DELETE
+
+// Custom
+let custom = OperationAllowlist([.select, .insert]) // Only SELECT and INSERT
+
+// Pass to ChatEngine or DataChatView
+let engine = ChatEngine(
+    database: dbPool,
+    model: model,
+    allowlist: .standard
+)
+```
+
+### 6.6 Mutation Policy (Table-Level Control)
+
+```swift
+// Read-only (default)
+let readOnly = MutationPolicy.readOnly
+
+// Allow INSERT and UPDATE on specific tables only
+let restricted = MutationPolicy(
+    allowedOperations: [.insert, .update],
+    allowedTables: ["orders", "order_items"]
+)
+
+// Full access — DELETE requires confirmation by default
+let full = MutationPolicy.unrestricted
+
+let engine = ChatEngine(
+    database: dbPool,
+    model: model,
+    mutationPolicy: restricted
+)
+```
+
+### 6.7 Custom Query Validators
+
+```swift
+// Built-in: restrict queries to specific tables
+let tableValidator = TableAllowlistValidator(
+    allowedTables: ["tasks", "projects"]
+)
+
+// Built-in: enforce row limits on SELECT queries
+let limitValidator = MaxRowLimitValidator(maxRows: 1000)
+
+// Custom: implement QueryValidator protocol
+struct NoJoinValidator: QueryValidator {
+    func validate(sql: String, operation: SQLOperation) throws {
+        if sql.uppercased().contains("JOIN") {
+            throw QueryValidationError.rejected("JOIN queries are not allowed.")
+        }
+    }
+}
+
+let config = ChatEngineConfiguration(
+    validators: [tableValidator, limitValidator, NoJoinValidator()]
+)
+
+let engine = ChatEngine(
+    database: dbPool,
+    model: model,
+    allowlist: .readOnly,
+    configuration: config
+)
+```
+
+### 6.8 Tool Execution Delegate (Destructive Operation Confirmation)
+
+```swift
+let engine = ChatEngine(
+    database: dbPool,
+    model: model,
+    allowlist: .unrestricted,
+    delegate: MyDelegate()
+)
+
+actor MyDelegate: ToolExecutionDelegate {
+    func confirmDestructiveOperation(_ context: DestructiveOperationContext) async -> Bool {
+        // Show confirmation UI, inspect context.sql, context.targetTable, etc.
+        return true  // or false to reject
+    }
+
+    func willExecuteSQL(_ sql: String, classification: SQLClassification) async {
+        // Observe before execution
+    }
+
+    func didExecuteSQL(_ sql: String, success: Bool) async {
+        // Observe after execution
+    }
+}
+```
+
+---
+
+## 7. Feature Requirements
+
+### P0 — Must Have (v1.0) — All Implemented
+
+| # | Feature | Description | Status |
+|---|---|---|---|
+| F1 | **Schema Discovery** | Auto-introspect all tables, columns (name, type, nullability, defaults), primary keys, foreign keys, and indexes from `sqlite_master` and PRAGMA statements. Zero config — no annotations needed. | Done |
+| F2 | **Natural Language to SQL** | Convert NL queries to SQL via LLM. The LLM generates raw SQL; no `#Predicate` or `FetchDescriptor` — pure SQL throughout. | Done |
+| F3 | **Result Rendering — Text** | `TextSummaryRenderer` uses the LLM to produce natural-language summaries of query results. | Done |
+| F4 | **Result Rendering — Data Tables** | `ScrollableDataTableView` renders query results as scrollable, structured tables in SwiftUI. | Done |
+| F5 | **Result Rendering — Swift Charts** | `ChartDataDetector` auto-detects chart-eligible results. `BarChartView`, `LineChartView`, `PieChartView` render via Swift Charts. | Done |
+| F6 | **Drop-in ChatView** | `DataChatView` (zero-config: path + model) and `ChatView` (full control via `ChatViewModel`). Message bubbles, loading states, error display. | Done |
+| F7 | **AnyLanguageModel Integration** | Uses HuggingFace's AnyLanguageModel for the LLM layer. `LanguageModelSession` for SQL generation and result summarization. | Done |
+| F8 | **SQL Safety — Operation Allowlist** | `OperationAllowlist` with presets (`.readOnly`, `.standard`, `.unrestricted`) and custom sets. Allowlist check only — no SQL parser for safety. | Done |
+| F9 | **SQL Safety — Mutation Policy** | `MutationPolicy` adds per-table restrictions on top of the allowlist. DELETE requires confirmation by default. | Done |
+| F10 | **SQL Safety — Custom Validators** | `QueryValidator` protocol with built-in `TableAllowlistValidator` and `MaxRowLimitValidator`. Extensible for domain-specific rules. | Done |
+| F11 | **Mutation Support** | INSERT, UPDATE, DELETE via SQL with allowlist validation and optional confirmation via `ToolExecutionDelegate`. | Done |
+| F12 | **Conversation Context** | Multi-turn support with configurable context window size. "Show overdue tasks" -> "Now sort them by priority" maintains history. | Done |
+| F13 | **Error Handling** | Typed `SwiftDBAIError` enum covering schema introspection failures, empty schemas, invalid SQL, disallowed operations, confirmation required, database errors, LLM failures, and query timeouts. | Done |
+
+### P1 — Should Have (v1.x)
+
+| # | Feature | Description |
+|---|---|---|
+| F14 | **On-Device Providers** | Guide for using Ollama, CoreML, MLX, or llama.cpp via AnyLanguageModel for fully offline / privacy-sensitive deployments |
+| F15 | **Chat History Persistence** | Optionally persist chat history to SQLite via GRDB |
+| F16 | **Theming API** | Customize colors, fonts, bubble styles, dark/light mode in ChatView |
+| F17 | **Streaming Responses** | Token-by-token display for cloud LLM providers |
+| F18 | **Export Results** | Copy/share query results as CSV, JSON, or formatted text |
+
+### P2 — Nice to Have (v2.0+)
+
+| # | Feature | Description |
+|---|---|---|
+| F19 | **Voice Input** | Speech-to-text for hands-free data queries |
+| F20 | **MCP Server Mode** | Expose any SQLite database as an MCP server so external LLM clients can query it |
+| F21 | **Suggested Questions** | Auto-generate starter questions based on introspected schema |
+| F22 | **Audit Log** | Log all mutations with timestamp, before/after values |
+| F23 | **Multi-Database** | Support querying across multiple SQLite databases simultaneously |
+
+---
+
+## 8. Privacy & Security
+
+| Concern | Approach |
+|---|---|
+| **Provider choice is yours** | Use Ollama or a self-hosted model to keep data off third-party servers |
+| **No telemetry** | The package collects nothing |
+| **API key handling** | Cloud provider keys are never persisted by the kit; developer is responsible for secure storage |
+| **SQL safety** | Developer-configured `OperationAllowlist` controls what SQL the LLM may generate. Allowlist check only — no attempt at SQL parsing for injection prevention. The developer is responsible for setting appropriate allowlist levels. |
+| **Mutation safety** | `MutationPolicy` provides per-table restrictions. DELETE requires explicit confirmation by default via `ToolExecutionDelegate`. |
+| **Data stays in-process** | Query results stay in the GRDB connection; no serialization to disk or network unless developer opts in |
+| **Connection ownership** | Developer manages their own GRDB `DatabasePool`/`DatabaseQueue`. SwiftDBAI never opens, closes, or migrates the database on its own. |
+
+---
+
+## 9. Technical Constraints
+
+- **Swift Package Manager** only (no CocoaPods/Carthage)
+- **Minimum deployments:** iOS 17.0, macOS 14.0, visionOS 1.0
+- **Swift 6.1** language mode with strict concurrency checking
+- **Dependencies:** GRDB.swift 7.0+ and AnyLanguageModel (branch: main)
+- **No UIKit dependency** — pure SwiftUI for the view layer
+- **No SwiftData dependency** — pure GRDB/SQL throughout. Works with any SQLite database regardless of how it was created.
+- **No Core Data dependency** — no ORM layer of any kind
+
+---
+
+## 10. Implementation Status
+
+| Metric | Current |
+|---|---|
+| Source files | 30 |
+| Test files | 19 |
+| Tests passing | 352 |
+| Swift language mode | 6.1 |
+| Dependencies | GRDB.swift 7.0+, AnyLanguageModel |
+
+---
+
+## 11. Success Metrics
+
+| Metric | Target |
+|---|---|
+| Integration time | < 5 minutes for basic "chat with my data" — provide a database path and a model |
+| Query accuracy | > 90% of common queries (SELECT with filters, sorting, aggregates) produce correct SQL on first attempt |
+| Latency (kit overhead) | < 500ms for schema introspection + SQL validation on a typical 20-table database (excludes LLM response time) |
+| Package size | < 2 MB added to app binary (excluding LLM model weights) |
+| Crash rate | 0 crashes from kit code in production |
+
+---
+
+## 12. Open Questions
+
+1. **AnyLanguageModel maturity** — The library is relatively new; we need to track API stability and pin to a specific version. What's our fallback if breaking changes land? (Currently pinned to `branch: main`.)
+2. **SQL injection surface** — The allowlist check validates operation type but does not parse SQL structure. Should we add a lightweight SQL tokenizer for additional safety, or is the allowlist sufficient given the LLM is the only SQL author?
+3. **Schema change detection** — `SchemaIntrospector` caches the schema after first introspection. If the database schema changes at runtime (migrations, etc.), the cache becomes stale. Should we add a `schema_version` PRAGMA check or a manual invalidation API?
+4. **Large schema handling** — For databases with many tables (100+), the schema description in the LLM system prompt may be very large. Should we add table filtering or relevance ranking?
+5. **Chart auto-detection accuracy** — `ChartDataDetector` heuristically determines if results are chart-eligible. How do we handle false positives/negatives?
+
+---
+
+## 13. Milestones
+
+| Milestone | Scope | Status |
+|---|---|---|
+| **M1: Foundation** | SchemaIntrospector + SQLQueryParser + headless ChatEngine | Done |
+| **M2: Safety** | OperationAllowlist + MutationPolicy + QueryValidator + ToolExecutionDelegate | Done |
+| **M3: Chat UI** | DataChatView + ChatView + ChatViewModel + MessageBubbleView + ErrorMessageView | Done |
+| **M4: Rendering** | TextSummaryRenderer + ScrollableDataTableView + ChartDataDetector + Bar/Line/Pie charts | Done |
+| **M5: Multi-turn** | ConversationHistory + context window + PromptBuilder with history | Done |
+| **M6: Polish & Ship** | Error handling (SwiftDBAIError), 352 tests, documentation | Done |
+
+---
+
+## 14. References
+
+- [GRDB.swift](https://github.com/groue/GRDB.swift) — SQLite toolkit for Swift
+- [AnyLanguageModel (HuggingFace)](https://github.com/huggingface/AnyLanguageModel) — Unified Swift LLM abstraction
+- [Swift Charts](https://developer.apple.com/documentation/charts) — Apple's declarative charting framework
+- [Model Context Protocol (MCP)](https://modelcontextprotocol.io) — For future MCP server mode
+- [Swift Package Manager](https://www.swift.org/documentation/package-manager/)
+- [SQLite PRAGMA Statements](https://www.sqlite.org/pragma.html) — Used for schema introspection