ignitionstack.pro v1.0 is out! Read the announcement →
Skip to Content

RAG Implementation

Retrieval Augmented Generation (RAG) enhances LLM responses by injecting relevant context from your document corpus. ignitionstack.pro implements a complete RAG pipeline using Supabase pgvector for vector storage and similarity search.

How RAG Works

┌─────────────────────────────────────────────────────────────────┐ │ RAG Pipeline │ └─────────────────────────────────────────────────────────────────┘ Document Upload Query Time ───────────────── ───────────────────────────────────── ┌──────────┐ ┌──────────┐ │ Document │ │ User │ │ (PDF) │ │ Query │ └────┬─────┘ └────┬─────┘ │ │ ▼ ▼ ┌──────────┐ ┌──────────┐ │ Extract │ │ Generate │ │ Text │ │ Embedding│ └────┬─────┘ └────┬─────┘ │ │ ▼ ▼ ┌──────────┐ ┌──────────────────┐ │ Chunk │ │ Similarity Search│ │ Text │ │ (pgvector) │ └────┬─────┘ └────────┬─────────┘ │ │ ▼ ▼ ┌──────────┐ ┌──────────────────┐ │ Generate │ │ Retrieve Top-K │ │Embeddings│ │ Chunks │ └────┬─────┘ └────────┬─────────┘ │ │ ▼ ▼ ┌──────────┐ ┌──────────────────┐ │ Store │ │ Augment Context │ │ pgvector │ │ + LLM Call │ └──────────┘ └──────────────────┘

Architecture

Core Components

ComponentLocationPurpose
RAG Servicelib/ai/rag/rag-service.tsOrchestrates retrieval and augmentation
Embedding Servicelib/ai/rag/embedding-service.tsGenerates vector embeddings
Document Processorlib/ai/rag/document-processor.tsExtracts and processes documents
Chunkinglib/ai/rag/chunking.tsSplits text into optimal chunks
PDF Extractorlib/ai/rag/pdf-extractor.tsExtracts text from PDFs

Database Schema

-- Embeddings table with pgvector CREATE TABLE ai_embeddings ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), conversation_id UUID REFERENCES ai_conversations(id), document_id UUID REFERENCES ai_documents(id), user_id UUID REFERENCES auth.users(id), content TEXT NOT NULL, embedding VECTOR(1536), -- OpenAI dimension metadata JSONB DEFAULT '{}', chunk_index INTEGER, created_at TIMESTAMPTZ DEFAULT NOW() ); -- Similarity search function CREATE FUNCTION match_embeddings( query_embedding VECTOR(1536), match_threshold FLOAT, match_count INT, filter_conversation_id UUID ) RETURNS TABLE ( id UUID, content TEXT, similarity FLOAT ) AS $$ SELECT id, content, 1 - (embedding <=> query_embedding) AS similarity FROM ai_embeddings WHERE conversation_id = filter_conversation_id AND 1 - (embedding <=> query_embedding) > match_threshold ORDER BY embedding <=> query_embedding LIMIT match_count; $$ LANGUAGE sql;

Document Processing

Supported File Types

TypeMIMEExtraction Method
Plain Texttext/plainDirect read
Markdowntext/markdownDirect read
JSONapplication/jsonStringify content
PDFapplication/pdfpdf-parse extraction

Processing Pipeline

// src/app/lib/ai/rag/document-processor.ts import { DocumentProcessor } from '@/lib/ai/rag/document-processor' const processor = new DocumentProcessor({ chunkingStrategy: 'recursive', maxChunkSize: 1000, chunkOverlap: 200, embeddingProvider: 'openai', }) // Process a document const result = await processor.process({ file: uploadedFile, conversationId: 'conv-123', userId: 'user-456', }) // Returns: { chunks: 15, embeddings: 15, documentId: 'doc-789' }

PDF Extraction

// src/app/lib/ai/rag/pdf-extractor.ts import { extractPDFText } from '@/lib/ai/rag/pdf-extractor' const { text, metadata } = await extractPDFText(pdfBuffer) // metadata: { pages: 10, title: 'Report', author: 'John' }

Chunking Strategies

Fixed Chunking

Simple fixed-size splits with overlap:

const chunks = chunkText(text, { strategy: 'fixed', maxSize: 1000, overlap: 200, })

Hierarchically splits on semantic boundaries:

const chunks = chunkText(text, { strategy: 'recursive', maxSize: 1000, overlap: 200, separators: ['\n\n', '\n', '. ', ' ', ''], })

How it works:

  1. Try splitting on paragraphs (\n\n)
  2. If chunks too large, split on newlines (\n)
  3. Continue with sentences (. )
  4. Finally split on words/characters

Semantic Chunking

Preserves paragraph boundaries for meaning:

const chunks = chunkText(text, { strategy: 'semantic', maxSize: 1000, })

Embedding Generation

Supported Providers

ProviderModelDimensionsCost
OpenAItext-embedding-3-small1536$0.02/1M tokens
OpenAItext-embedding-3-large3072$0.13/1M tokens
Geminitext-embedding-004768~Free
Ollamanomic-embed-text768Free (local)

Usage

// src/app/lib/ai/rag/embedding-service.ts import { EmbeddingService } from '@/lib/ai/rag/embedding-service' const embeddingService = new EmbeddingService({ provider: 'openai', model: 'text-embedding-3-small', }) // Single embedding const vector = await embeddingService.embed('Hello world') // Batch embeddings (more efficient) const vectors = await embeddingService.embedBatch([ 'First chunk', 'Second chunk', 'Third chunk', ])

Similarity Calculation

// Cosine similarity between vectors const similarity = embeddingService.cosineSimilarity(vectorA, vectorB) // Returns: 0.0 to 1.0 (higher = more similar)

RAG Service

Retrieving Context

// src/app/lib/ai/rag/rag-service.ts import { RAGService } from '@/lib/ai/rag/rag-service' const ragService = new RAGService({ provider: 'openai', model: 'text-embedding-3-small', topK: 5, similarityThreshold: 0.7, maxContextTokens: 2000, }) // Retrieve relevant chunks const context = await ragService.retrieveContext({ query: 'What are the key findings?', conversationId: 'conv-123', }) // Returns array of matching chunks with similarity scores

Augmenting Messages

// Automatically augment messages with RAG context const augmentedMessages = await ragService.augmentMessages({ messages: conversationHistory, query: userMessage, conversationId: 'conv-123', }) // The system message now includes: // "Use the following context to answer: [Source 1] ... [Source 2] ..."

Citation Extraction

// Extract citations from LLM response const citations = ragService.extractCitations( 'According to [Source 1], the revenue increased...' ) // Returns: [{ sourceIndex: 1, text: 'the revenue increased...' }]

API Endpoints

Document Upload

// POST /api/ai/upload const formData = new FormData() formData.append('file', pdfFile) formData.append('conversationId', 'conv-123') const response = await fetch('/api/ai/upload', { method: 'POST', body: formData, }) // Response: { documentId, chunks, status: 'processed' }

Chat with RAG

// POST /api/ai/chat // RAG is automatically applied when conversation has documents const response = await fetch('/api/ai/chat', { method: 'POST', body: JSON.stringify({ conversationId: 'conv-123', message: 'Summarize the uploaded document', enableRAG: true, // optional, defaults to true if docs exist }), })

Configuration

Environment Variables

# .env.local ENABLE_RAG=true EMBEDDING_MODEL=text-embedding-3-small RAG_TOP_K=5 RAG_SIMILARITY_THRESHOLD=0.7 RAG_MAX_CONTEXT_TOKENS=2000

Conversation-Level Settings

// Enable RAG for a conversation await conversationRepository.update(conversationId, { settings: { enableRAG: true, ragTopK: 5, ragThreshold: 0.7, }, })

Best Practices

1. Optimize Chunk Size

2. Choose the Right Embedding Model

Use CaseRecommended Model
General purposetext-embedding-3-small
High accuracytext-embedding-3-large
Cost-sensitivenomic-embed-text (Ollama)
Privacy-criticalnomic-embed-text (local)

3. Tune Similarity Threshold

4. Handle Large Documents

// Process in batches to avoid timeouts const processor = new DocumentProcessor({ batchSize: 10, // Process 10 chunks at a time delayBetweenBatches: 100, // ms delay to avoid rate limits })

5. Monitor Embedding Costs

-- Check embedding token usage SELECT DATE(created_at) as date, SUM(tokens) as total_tokens, SUM(cost) as total_cost FROM ai_usage_logs WHERE operation = 'embedding' GROUP BY DATE(created_at) ORDER BY date DESC;

Troubleshooting

No Results Returned

  1. Check similarity threshold (try lowering to 0.5)
  2. Verify embeddings exist: SELECT COUNT(*) FROM ai_embeddings WHERE conversation_id = ?
  3. Ensure embedding dimensions match (1536 for OpenAI)

Slow Retrieval

  1. Add pgvector index:
CREATE INDEX ON ai_embeddings USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
  1. Reduce topK parameter
  2. Consider dimension reduction for large datasets

Irrelevant Context

  1. Improve chunking strategy (try semantic)
  2. Increase similarity threshold
  3. Add metadata filtering