Initial implementation of SwiftDBAI

Chat with any SQLite database using natural language. Built on
AnyLanguageModel (HuggingFace) for LLM-agnostic provider support
and GRDB for SQLite access.

Core features:
- Auto schema introspection from sqlite_master (zero config)
- NL → SQL generation via any AnyLanguageModel provider
- Three rendering modes: text summary, data table, Swift Charts
- Drop-in DataChatView (SwiftUI) and headless ChatEngine
- Operation allowlist with read-only default
- Mutation policy with per-table control
- ToolExecutionDelegate for destructive operation confirmation
- Multi-turn conversation context
- 352 tests across 24 suites, all passing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Krishna Kumar
2026-04-04 09:30:56 -05:00
commit b1724fe7ca
55 changed files with 15506 additions and 0 deletions

468
PRD.md Normal file
View File

@@ -0,0 +1,468 @@
# SwiftDBAI — Product Requirements Document
> **SwiftDBAI** is the umbrella name for AI-powered SQLite database tooling. v1 ships `SwiftDBAI` (chat + SQL engine). Future versions may add `SwiftDBAIMCP` (MCP server mode).
**Version:** 0.2 (Revised — post-pivot from SwiftDataAI)
**Date:** 2026-04-04
**Author:** Krishna Kumar
---
## 1. Problem Statement
Developers building apps with SQLite databases have no natural-language interface to query, explore, or mutate their data. Debugging, prototyping, and building AI-powered features all require hand-writing SQL — even for simple questions like "show me all overdue tasks" or "how many users signed up this week."
There is no drop-in Swift package that lets a user (or an LLM) **chat with any SQLite database** using plain English.
---
## 2. Vision
**SwiftDBAI** is a Swift package that gives any SQLite-backed app a conversational interface to its data. Developers embed it in minutes; end users ask questions and get answers from their own data.
The data layer is **all SQL via GRDB** — no SwiftData APIs, no `#Predicate`, no `FetchDescriptor`. SwiftDBAI works with **any SQLite database**, not just SwiftData stores. Schema discovery is automatic via `sqlite_master` introspection — zero configuration required. The developer passes their own GRDB `DatabasePool` or `DatabaseQueue`; SwiftDBAI never manages the connection lifecycle.
Built on [**AnyLanguageModel**](https://github.com/huggingface/AnyLanguageModel) from Hugging Face — a unified Swift LLM abstraction that supports OpenAI, Anthropic, Gemini, Ollama, CoreML, MLX, and llama.cpp through a single API. SwiftDBAI generates SQL from natural language, validates it against a developer-configured operation allowlist, executes it via GRDB, and renders results as text, data tables, or Swift Charts.
---
## 3. Target Users
| Persona | Need |
|---|---|
| **iOS/macOS Developer** | Drop-in chat UI + engine to add "talk to your data" features to any SQLite-backed app without building NLP pipelines |
| **AI/LLM App Builder** | SQL generation layer that lets an LLM read/write any SQLite database through validated, allowlisted operations |
| **Power User / Debugger** | In-app console to inspect and mutate SQLite data during development |
---
## 4. Goals & Non-Goals
### Goals
- Natural-language querying of any SQLite database via GRDB
- **LLM-agnostic** via [AnyLanguageModel](https://github.com/huggingface/AnyLanguageModel) — works with OpenAI, Anthropic, Gemini, Ollama, CoreML, MLX, llama.cpp out of the box
- Drop-in SwiftUI chat view that "just works" with zero configuration — provide a database path and a model
- Schema-aware — automatically introspects tables, columns, types, primary keys, foreign keys, and indexes from `sqlite_master`
- Read **and** write support (SELECT, INSERT, UPDATE, DELETE) with developer-configured operation allowlist and confirmation guards
- All SQL validation via allowlist check — no SQL parser for safety, no `#Predicate` generation
- UI rendering: text summaries + scrollable data tables + Swift Charts (bar, line, pie) — all in v1
- Swift 6 concurrency safe, structured concurrency throughout (Swift 6.1 language mode)
- Works on iOS 17+, macOS 14+, visionOS 1+
### Non-Goals (v1)
- ~~Replacing Core Data~~ Not tied to any ORM — works with raw SQLite
- Building a general-purpose chat framework (data-scoped only)
- Full SQL parsing for safety (allowlist check is sufficient)
- Training or fine-tuning models
- Cloud sync of chat history
- Managing database connections (developer owns the GRDB connection)
---
## 5. Architecture Overview
```
┌──────────────────────────────────────────────────────────┐
│ SwiftDBAI │
├──────────┬───────────┬──────────────┬────────────────────┤
│ Chat UI │ Engine │ Schema │ SQL Pipeline │
│ (SwiftUI)│ │ Introspector │ │
└────┬─────┴─────┬─────┴──────┬───────┴───────┬────────────┘
│ │ │ │
▼ ▼ ▼ ▼
ChatView ChatEngine sqlite_master SQLQueryParser
DataChat PromptBuilder PRAGMA OperationAllowlist
View TextSummary table_info MutationPolicy
Renderer foreign_keys QueryValidator
index_list
┌──────────────────────────────────────────────────────────┐
│ Rendering Layer │
├──────────┬──────────────┬────────────────────────────────┤
│ Text │ DataTable │ Swift Charts │
│ Summary │ (scrollable)│ (Bar, Line, Pie) │
└──────────┴──────────────┴────────────────────────────────┘
┌──────────────────────────────────────────────────────────┐
│ GRDB.swift 7.0+ │
│ DatabasePool / DatabaseQueue │
└──────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────┐
│ AnyLanguageModel (HuggingFace) │
├──────┬──────┬────────┬───────┬───────┬──────┬────────────┤
│OpenAI│Claude│ Gemini │Ollama │CoreML │ MLX │ llama.cpp │
└──────┴──────┴────────┴───────┴───────┴──────┴────────────┘
```
### 5.1 Core Modules
| Module | Responsibility |
|---|---|
| **SchemaIntrospector** | Queries `sqlite_master`, `PRAGMA table_info`, `PRAGMA foreign_key_list`, and `PRAGMA index_list` to auto-discover all tables, columns (name, type, nullability, defaults), primary keys, foreign keys, and indexes. Produces a `DatabaseSchema` model the LLM uses as context. Zero configuration — no annotations or model definitions needed. |
| **SQLQueryParser** | Extracts SQL from the raw LLM response, detects the operation type (SELECT/INSERT/UPDATE/DELETE), validates it against the `OperationAllowlist`, enforces `MutationPolicy` table restrictions, and flags destructive operations that require confirmation. |
| **OperationAllowlist** | Developer-configured set of permitted SQL operations. Presets: `.readOnly` (SELECT only, the default), `.standard` (SELECT + INSERT + UPDATE), `.unrestricted` (all including DELETE). |
| **MutationPolicy** | Builds on `OperationAllowlist` with per-table restrictions. Controls which mutations are allowed on which tables. DELETE requires confirmation by default. |
| **QueryValidator** | Extensible protocol for custom pre-execution validation rules (e.g., `TableAllowlistValidator`, `MaxRowLimitValidator`). Developers implement `QueryValidator` to add domain-specific checks. |
| **ChatEngine** | Orchestrates the full pipeline: schema introspection (once, lazily) -> system prompt with schema context -> LLM generates SQL -> `SQLQueryParser` validates -> GRDB executes -> `TextSummaryRenderer` summarizes -> response. Supports multi-turn conversation with configurable context window. |
| **PromptBuilder** | Constructs the LLM system prompt including the introspected schema description, allowlist rules, and optional developer-provided context. |
| **TextSummaryRenderer** | Uses the LLM to generate natural-language summaries of query results. Configurable max rows for summarization. |
| **ChatView / DataChatView** | Drop-in SwiftUI views. `DataChatView` is the zero-config entry point (database path + model). `ChatView` accepts a `ChatViewModel` for full control. Renders message bubbles, scrollable data tables, Swift Charts (bar/line/pie via `ChartDataDetector`), and error states. |
### 5.2 Data Flow
```
User types: "Show me all tasks due this week"
ChatEngine ensures schema is introspected (via SchemaIntrospector)
- Queries sqlite_master, PRAGMA table_info, foreign_key_list, index_list
- Caches DatabaseSchema for subsequent queries
PromptBuilder constructs system prompt with:
- Full schema description (tables, columns, types, keys, indexes)
- OperationAllowlist rules
- Optional developer context
- Conversation history (within context window)
LanguageModelSession.respond(to: userMessage)
→ AnyLanguageModel routes to configured provider (OpenAI / Anthropic / Ollama / ...)
LLM returns raw SQL: "SELECT * FROM tasks WHERE dueDate >= date('now', 'weekday 0', '-7 days') ORDER BY dueDate ASC"
SQLQueryParser:
1. Extracts SQL from LLM response (strips markdown fences, etc.)
2. Detects operation type → SELECT
3. Validates against OperationAllowlist → allowed
4. Checks MutationPolicy table restrictions (if applicable)
5. Runs custom QueryValidators
GRDB executes SQL via DatabasePool/DatabaseQueue
→ Returns rows as [[String: Value]] with column names
TextSummaryRenderer asks LLM to summarize results in natural language
ChartDataDetector checks if results are chart-eligible
ChatView renders: text summary + scrollable DataTable + Swift Charts (if applicable)
```
---
## 6. Key APIs (Implemented)
### 6.1 Setup (Minimal — Zero Config)
```swift
import SwiftDBAI
import AnyLanguageModel
struct ContentView: View {
var body: some View {
// Just a database path and a model that's it
DataChatView(
databasePath: "/path/to/mydata.sqlite",
model: OllamaLanguageModel(model: "llama3")
)
}
}
```
### 6.2 Choosing a Provider (via AnyLanguageModel)
```swift
import AnyLanguageModel
// OpenAI
let model = OpenAILanguageModel(apiKey: "sk-...", model: "gpt-4o")
// Anthropic
let model = AnthropicLanguageModel(apiKey: "sk-ant-...", model: "claude-sonnet-4-20250514")
// Ollama (local)
let model = OllamaLanguageModel(model: "llama3")
// Gemini
let model = GeminiLanguageModel(apiKey: "...", model: "gemini-2.0-flash")
// Pass to DataChatView with options
DataChatView(
databasePath: "/path/to/db.sqlite",
model: model,
allowlist: .standard,
additionalContext: "This database stores a recipe app's data."
)
```
### 6.3 Bringing Your Own GRDB Connection
```swift
import GRDB
import SwiftDBAI
// Developer manages their own connection
let dbPool = try DatabasePool(path: "/path/to/mydata.sqlite")
// Option A: DataChatView with existing connection
DataChatView(
database: dbPool,
model: model,
allowlist: .readOnly
)
// Option B: Headless / programmatic use via ChatEngine
let engine = ChatEngine(
database: dbPool,
model: model,
allowlist: .standard
)
let response = try await engine.send("How many tasks are overdue?")
print(response.summary) // "You have 12 overdue tasks."
print(response.sql) // "SELECT COUNT(*) FROM tasks WHERE dueDate < date('now')"
print(response.queryResult) // QueryResult with columns, rows, execution time
```
### 6.4 Schema Introspection (Auto — Zero Config)
```swift
// Schema is introspected automatically on first query.
// Or pre-warm it explicitly:
let schema = try await engine.prepareSchema()
// schema.tableNames ["tasks", "projects", "users"]
// schema.tables["tasks"]?.columns [ColumnSchema(name: "id", type: "INTEGER", isPrimaryKey: true), ...]
// schema.tables["tasks"]?.foreignKeys [ForeignKeySchema(fromColumn: "projectId", toTable: "projects", ...)]
// schema.schemaDescription Compact text for LLM prompts
// No @Model annotations, no #Predicate, no FetchDescriptor.
// Just sqlite_master + PRAGMA introspection.
```
### 6.5 Operation Allowlist (Safety)
```swift
// Presets
let readOnly = OperationAllowlist.readOnly // SELECT only (default)
let standard = OperationAllowlist.standard // SELECT + INSERT + UPDATE
let unrestricted = OperationAllowlist.unrestricted // All including DELETE
// Custom
let custom = OperationAllowlist([.select, .insert]) // Only SELECT and INSERT
// Pass to ChatEngine or DataChatView
let engine = ChatEngine(
database: dbPool,
model: model,
allowlist: .standard
)
```
### 6.6 Mutation Policy (Table-Level Control)
```swift
// Read-only (default)
let readOnly = MutationPolicy.readOnly
// Allow INSERT and UPDATE on specific tables only
let restricted = MutationPolicy(
allowedOperations: [.insert, .update],
allowedTables: ["orders", "order_items"]
)
// Full access DELETE requires confirmation by default
let full = MutationPolicy.unrestricted
let engine = ChatEngine(
database: dbPool,
model: model,
mutationPolicy: restricted
)
```
### 6.7 Custom Query Validators
```swift
// Built-in: restrict queries to specific tables
let tableValidator = TableAllowlistValidator(
allowedTables: ["tasks", "projects"]
)
// Built-in: enforce row limits on SELECT queries
let limitValidator = MaxRowLimitValidator(maxRows: 1000)
// Custom: implement QueryValidator protocol
struct NoJoinValidator: QueryValidator {
func validate(sql: String, operation: SQLOperation) throws {
if sql.uppercased().contains("JOIN") {
throw QueryValidationError.rejected("JOIN queries are not allowed.")
}
}
}
let config = ChatEngineConfiguration(
validators: [tableValidator, limitValidator, NoJoinValidator()]
)
let engine = ChatEngine(
database: dbPool,
model: model,
allowlist: .readOnly,
configuration: config
)
```
### 6.8 Tool Execution Delegate (Destructive Operation Confirmation)
```swift
let engine = ChatEngine(
database: dbPool,
model: model,
allowlist: .unrestricted,
delegate: MyDelegate()
)
actor MyDelegate: ToolExecutionDelegate {
func confirmDestructiveOperation(_ context: DestructiveOperationContext) async -> Bool {
// Show confirmation UI, inspect context.sql, context.targetTable, etc.
return true // or false to reject
}
func willExecuteSQL(_ sql: String, classification: SQLClassification) async {
// Observe before execution
}
func didExecuteSQL(_ sql: String, success: Bool) async {
// Observe after execution
}
}
```
---
## 7. Feature Requirements
### P0 — Must Have (v1.0) — All Implemented
| # | Feature | Description | Status |
|---|---|---|---|
| F1 | **Schema Discovery** | Auto-introspect all tables, columns (name, type, nullability, defaults), primary keys, foreign keys, and indexes from `sqlite_master` and PRAGMA statements. Zero config — no annotations needed. | Done |
| F2 | **Natural Language to SQL** | Convert NL queries to SQL via LLM. The LLM generates raw SQL; no `#Predicate` or `FetchDescriptor` — pure SQL throughout. | Done |
| F3 | **Result Rendering — Text** | `TextSummaryRenderer` uses the LLM to produce natural-language summaries of query results. | Done |
| F4 | **Result Rendering — Data Tables** | `ScrollableDataTableView` renders query results as scrollable, structured tables in SwiftUI. | Done |
| F5 | **Result Rendering — Swift Charts** | `ChartDataDetector` auto-detects chart-eligible results. `BarChartView`, `LineChartView`, `PieChartView` render via Swift Charts. | Done |
| F6 | **Drop-in ChatView** | `DataChatView` (zero-config: path + model) and `ChatView` (full control via `ChatViewModel`). Message bubbles, loading states, error display. | Done |
| F7 | **AnyLanguageModel Integration** | Uses HuggingFace's AnyLanguageModel for the LLM layer. `LanguageModelSession` for SQL generation and result summarization. | Done |
| F8 | **SQL Safety — Operation Allowlist** | `OperationAllowlist` with presets (`.readOnly`, `.standard`, `.unrestricted`) and custom sets. Allowlist check only — no SQL parser for safety. | Done |
| F9 | **SQL Safety — Mutation Policy** | `MutationPolicy` adds per-table restrictions on top of the allowlist. DELETE requires confirmation by default. | Done |
| F10 | **SQL Safety — Custom Validators** | `QueryValidator` protocol with built-in `TableAllowlistValidator` and `MaxRowLimitValidator`. Extensible for domain-specific rules. | Done |
| F11 | **Mutation Support** | INSERT, UPDATE, DELETE via SQL with allowlist validation and optional confirmation via `ToolExecutionDelegate`. | Done |
| F12 | **Conversation Context** | Multi-turn support with configurable context window size. "Show overdue tasks" -> "Now sort them by priority" maintains history. | Done |
| F13 | **Error Handling** | Typed `SwiftDBAIError` enum covering schema introspection failures, empty schemas, invalid SQL, disallowed operations, confirmation required, database errors, LLM failures, and query timeouts. | Done |
### P1 — Should Have (v1.x)
| # | Feature | Description |
|---|---|---|
| F14 | **On-Device Providers** | Guide for using Ollama, CoreML, MLX, or llama.cpp via AnyLanguageModel for fully offline / privacy-sensitive deployments |
| F15 | **Chat History Persistence** | Optionally persist chat history to SQLite via GRDB |
| F16 | **Theming API** | Customize colors, fonts, bubble styles, dark/light mode in ChatView |
| F17 | **Streaming Responses** | Token-by-token display for cloud LLM providers |
| F18 | **Export Results** | Copy/share query results as CSV, JSON, or formatted text |
### P2 — Nice to Have (v2.0+)
| # | Feature | Description |
|---|---|---|
| F19 | **Voice Input** | Speech-to-text for hands-free data queries |
| F20 | **MCP Server Mode** | Expose any SQLite database as an MCP server so external LLM clients can query it |
| F21 | **Suggested Questions** | Auto-generate starter questions based on introspected schema |
| F22 | **Audit Log** | Log all mutations with timestamp, before/after values |
| F23 | **Multi-Database** | Support querying across multiple SQLite databases simultaneously |
---
## 8. Privacy & Security
| Concern | Approach |
|---|---|
| **Provider choice is yours** | Use Ollama or a self-hosted model to keep data off third-party servers |
| **No telemetry** | The package collects nothing |
| **API key handling** | Cloud provider keys are never persisted by the kit; developer is responsible for secure storage |
| **SQL safety** | Developer-configured `OperationAllowlist` controls what SQL the LLM may generate. Allowlist check only — no attempt at SQL parsing for injection prevention. The developer is responsible for setting appropriate allowlist levels. |
| **Mutation safety** | `MutationPolicy` provides per-table restrictions. DELETE requires explicit confirmation by default via `ToolExecutionDelegate`. |
| **Data stays in-process** | Query results stay in the GRDB connection; no serialization to disk or network unless developer opts in |
| **Connection ownership** | Developer manages their own GRDB `DatabasePool`/`DatabaseQueue`. SwiftDBAI never opens, closes, or migrates the database on its own. |
---
## 9. Technical Constraints
- **Swift Package Manager** only (no CocoaPods/Carthage)
- **Minimum deployments:** iOS 17.0, macOS 14.0, visionOS 1.0
- **Swift 6.1** language mode with strict concurrency checking
- **Dependencies:** GRDB.swift 7.0+ and AnyLanguageModel (branch: main)
- **No UIKit dependency** — pure SwiftUI for the view layer
- **No SwiftData dependency** — pure GRDB/SQL throughout. Works with any SQLite database regardless of how it was created.
- **No Core Data dependency** — no ORM layer of any kind
---
## 10. Implementation Status
| Metric | Current |
|---|---|
| Source files | 30 |
| Test files | 19 |
| Tests passing | 352 |
| Swift language mode | 6.1 |
| Dependencies | GRDB.swift 7.0+, AnyLanguageModel |
---
## 11. Success Metrics
| Metric | Target |
|---|---|
| Integration time | < 5 minutes for basic "chat with my data" — provide a database path and a model |
| Query accuracy | > 90% of common queries (SELECT with filters, sorting, aggregates) produce correct SQL on first attempt |
| Latency (kit overhead) | < 500ms for schema introspection + SQL validation on a typical 20-table database (excludes LLM response time) |
| Package size | < 2 MB added to app binary (excluding LLM model weights) |
| Crash rate | 0 crashes from kit code in production |
---
## 12. Open Questions
1. **AnyLanguageModel maturity** — The library is relatively new; we need to track API stability and pin to a specific version. What's our fallback if breaking changes land? (Currently pinned to `branch: main`.)
2. **SQL injection surface** — The allowlist check validates operation type but does not parse SQL structure. Should we add a lightweight SQL tokenizer for additional safety, or is the allowlist sufficient given the LLM is the only SQL author?
3. **Schema change detection**`SchemaIntrospector` caches the schema after first introspection. If the database schema changes at runtime (migrations, etc.), the cache becomes stale. Should we add a `schema_version` PRAGMA check or a manual invalidation API?
4. **Large schema handling** — For databases with many tables (100+), the schema description in the LLM system prompt may be very large. Should we add table filtering or relevance ranking?
5. **Chart auto-detection accuracy**`ChartDataDetector` heuristically determines if results are chart-eligible. How do we handle false positives/negatives?
---
## 13. Milestones
| Milestone | Scope | Status |
|---|---|---|
| **M1: Foundation** | SchemaIntrospector + SQLQueryParser + headless ChatEngine | Done |
| **M2: Safety** | OperationAllowlist + MutationPolicy + QueryValidator + ToolExecutionDelegate | Done |
| **M3: Chat UI** | DataChatView + ChatView + ChatViewModel + MessageBubbleView + ErrorMessageView | Done |
| **M4: Rendering** | TextSummaryRenderer + ScrollableDataTableView + ChartDataDetector + Bar/Line/Pie charts | Done |
| **M5: Multi-turn** | ConversationHistory + context window + PromptBuilder with history | Done |
| **M6: Polish & Ship** | Error handling (SwiftDBAIError), 352 tests, documentation | Done |
---
## 14. References
- [GRDB.swift](https://github.com/groue/GRDB.swift) — SQLite toolkit for Swift
- [AnyLanguageModel (HuggingFace)](https://github.com/huggingface/AnyLanguageModel) — Unified Swift LLM abstraction
- [Swift Charts](https://developer.apple.com/documentation/charts) — Apple's declarative charting framework
- [Model Context Protocol (MCP)](https://modelcontextprotocol.io) — For future MCP server mode
- [Swift Package Manager](https://www.swift.org/documentation/package-manager/)
- [SQLite PRAGMA Statements](https://www.sqlite.org/pragma.html) — Used for schema introspection