Retrieval Augmented Generation (RAG) enhances LLM responses by injecting relevant context from your document corpus. ignitionstack.pro implements a complete RAG pipeline using Supabase pgvector for vector storage and similarity search.
┌─────────────────────────────────────────────────────────────────┐
│ RAG Pipeline │
└─────────────────────────────────────────────────────────────────┘
Document Upload Query Time
───────────────── ─────────────────────────────────────
┌──────────┐ ┌──────────┐
│ Document │ │ User │
│ (PDF) │ │ Query │
└────┬─────┘ └────┬─────┘
│ │
▼ ▼
┌──────────┐ ┌──────────┐
│ Extract │ │ Generate │
│ Text │ │ Embedding│
└────┬─────┘ └────┬─────┘
│ │
▼ ▼
┌──────────┐ ┌──────────────────┐
│ Chunk │ │ Similarity Search│
│ Text │ │ (pgvector) │
└────┬─────┘ └────────┬─────────┘
│ │
▼ ▼
┌──────────┐ ┌──────────────────┐
│ Generate │ │ Retrieve Top-K │
│Embeddings│ │ Chunks │
└────┬─────┘ └────────┬─────────┘
│ │
▼ ▼
┌──────────┐ ┌──────────────────┐
│ Store │ │ Augment Context │
│ pgvector │ │ + LLM Call │
└──────────┘ └──────────────────┘| Component | Location | Purpose |
|---|---|---|
| RAG Service | lib/ai/rag/rag-service.ts | Orchestrates retrieval and augmentation |
| Embedding Service | lib/ai/rag/embedding-service.ts | Generates vector embeddings |
| Document Processor | lib/ai/rag/document-processor.ts | Extracts and processes documents |
| Chunking | lib/ai/rag/chunking.ts | Splits text into optimal chunks |
| PDF Extractor | lib/ai/rag/pdf-extractor.ts | Extracts text from PDFs |
-- Embeddings table with pgvector
CREATE TABLE ai_embeddings (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
conversation_id UUID REFERENCES ai_conversations(id),
document_id UUID REFERENCES ai_documents(id),
user_id UUID REFERENCES auth.users(id),
content TEXT NOT NULL,
embedding VECTOR(1536), -- OpenAI dimension
metadata JSONB DEFAULT '{}',
chunk_index INTEGER,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Similarity search function
CREATE FUNCTION match_embeddings(
query_embedding VECTOR(1536),
match_threshold FLOAT,
match_count INT,
filter_conversation_id UUID
) RETURNS TABLE (
id UUID,
content TEXT,
similarity FLOAT
) AS $$
SELECT id, content, 1 - (embedding <=> query_embedding) AS similarity
FROM ai_embeddings
WHERE conversation_id = filter_conversation_id
AND 1 - (embedding <=> query_embedding) > match_threshold
ORDER BY embedding <=> query_embedding
LIMIT match_count;
$$ LANGUAGE sql;| Type | MIME | Extraction Method |
|---|---|---|
| Plain Text | text/plain | Direct read |
| Markdown | text/markdown | Direct read |
| JSON | application/json | Stringify content |
application/pdf | pdf-parse extraction |
// src/app/lib/ai/rag/document-processor.ts
import { DocumentProcessor } from '@/lib/ai/rag/document-processor'
const processor = new DocumentProcessor({
chunkingStrategy: 'recursive',
maxChunkSize: 1000,
chunkOverlap: 200,
embeddingProvider: 'openai',
})
// Process a document
const result = await processor.process({
file: uploadedFile,
conversationId: 'conv-123',
userId: 'user-456',
})
// Returns: { chunks: 15, embeddings: 15, documentId: 'doc-789' }// src/app/lib/ai/rag/pdf-extractor.ts
import { extractPDFText } from '@/lib/ai/rag/pdf-extractor'
const { text, metadata } = await extractPDFText(pdfBuffer)
// metadata: { pages: 10, title: 'Report', author: 'John' }Simple fixed-size splits with overlap:
const chunks = chunkText(text, {
strategy: 'fixed',
maxSize: 1000,
overlap: 200,
})Hierarchically splits on semantic boundaries:
const chunks = chunkText(text, {
strategy: 'recursive',
maxSize: 1000,
overlap: 200,
separators: ['\n\n', '\n', '. ', ' ', ''],
})How it works:
\n\n)\n). )Preserves paragraph boundaries for meaning:
const chunks = chunkText(text, {
strategy: 'semantic',
maxSize: 1000,
})| Provider | Model | Dimensions | Cost |
|---|---|---|---|
| OpenAI | text-embedding-3-small | 1536 | $0.02/1M tokens |
| OpenAI | text-embedding-3-large | 3072 | $0.13/1M tokens |
| Gemini | text-embedding-004 | 768 | ~Free |
| Ollama | nomic-embed-text | 768 | Free (local) |
// src/app/lib/ai/rag/embedding-service.ts
import { EmbeddingService } from '@/lib/ai/rag/embedding-service'
const embeddingService = new EmbeddingService({
provider: 'openai',
model: 'text-embedding-3-small',
})
// Single embedding
const vector = await embeddingService.embed('Hello world')
// Batch embeddings (more efficient)
const vectors = await embeddingService.embedBatch([
'First chunk',
'Second chunk',
'Third chunk',
])// Cosine similarity between vectors
const similarity = embeddingService.cosineSimilarity(vectorA, vectorB)
// Returns: 0.0 to 1.0 (higher = more similar)// src/app/lib/ai/rag/rag-service.ts
import { RAGService } from '@/lib/ai/rag/rag-service'
const ragService = new RAGService({
provider: 'openai',
model: 'text-embedding-3-small',
topK: 5,
similarityThreshold: 0.7,
maxContextTokens: 2000,
})
// Retrieve relevant chunks
const context = await ragService.retrieveContext({
query: 'What are the key findings?',
conversationId: 'conv-123',
})
// Returns array of matching chunks with similarity scores// Automatically augment messages with RAG context
const augmentedMessages = await ragService.augmentMessages({
messages: conversationHistory,
query: userMessage,
conversationId: 'conv-123',
})
// The system message now includes:
// "Use the following context to answer: [Source 1] ... [Source 2] ..."// Extract citations from LLM response
const citations = ragService.extractCitations(
'According to [Source 1], the revenue increased...'
)
// Returns: [{ sourceIndex: 1, text: 'the revenue increased...' }]// POST /api/ai/upload
const formData = new FormData()
formData.append('file', pdfFile)
formData.append('conversationId', 'conv-123')
const response = await fetch('/api/ai/upload', {
method: 'POST',
body: formData,
})
// Response: { documentId, chunks, status: 'processed' }// POST /api/ai/chat
// RAG is automatically applied when conversation has documents
const response = await fetch('/api/ai/chat', {
method: 'POST',
body: JSON.stringify({
conversationId: 'conv-123',
message: 'Summarize the uploaded document',
enableRAG: true, // optional, defaults to true if docs exist
}),
})# .env.local
ENABLE_RAG=true
EMBEDDING_MODEL=text-embedding-3-small
RAG_TOP_K=5
RAG_SIMILARITY_THRESHOLD=0.7
RAG_MAX_CONTEXT_TOKENS=2000// Enable RAG for a conversation
await conversationRepository.update(conversationId, {
settings: {
enableRAG: true,
ragTopK: 5,
ragThreshold: 0.7,
},
})| Use Case | Recommended Model |
|---|---|
| General purpose | text-embedding-3-small |
| High accuracy | text-embedding-3-large |
| Cost-sensitive | nomic-embed-text (Ollama) |
| Privacy-critical | nomic-embed-text (local) |
// Process in batches to avoid timeouts
const processor = new DocumentProcessor({
batchSize: 10, // Process 10 chunks at a time
delayBetweenBatches: 100, // ms delay to avoid rate limits
})-- Check embedding token usage
SELECT
DATE(created_at) as date,
SUM(tokens) as total_tokens,
SUM(cost) as total_cost
FROM ai_usage_logs
WHERE operation = 'embedding'
GROUP BY DATE(created_at)
ORDER BY date DESC;SELECT COUNT(*) FROM ai_embeddings WHERE conversation_id = ?CREATE INDEX ON ai_embeddings
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);topK parametersemantic)