RAG Search
Retrieval-augmented generation powering all agent responses. Every query is grounded in 500+ indexed documents — code, docs, conversations, and institutional knowledge — retrieved by semantic similarity and injected as live context.
How It Works
Ingest
Documents are chunked, embedded, and indexed into the vector store. Metadata tagging and overlap windows preserve context across chunk boundaries.
Search
Semantic similarity search finds the most relevant chunks for any query. Cosine distance over embeddings — not keyword matching — so meaning is preserved.
Augment
Retrieved context is injected into agent prompts before each execution round. Agents arrive grounded, not blank — answers are traceable to source documents.
Capabilities
Semantic Search
Cosine similarity across dense embeddings. Finds conceptually related content even when exact keywords are absent.
Chunk Management
Automatic chunking with configurable overlap. Each chunk carries source metadata for attribution and re-ranking.
Multi-Index
Separate indexes for docs, code, conversations, and entities. Queries can target a single index or fan out across all four.
Freshness Tracking
Automatic staleness detection flags outdated chunks. Re-indexing triggers run on document updates to keep the knowledge layer current.
API Endpoints
rag/search
Full-text semantic search across all indexed content. Returns ranked chunks with source references and similarity scores.
rag/ask
RAG-powered question answering. Retrieves the most relevant context and synthesizes a grounded, cited response.
rag/ingest
Feed new documents into the index. Accepts plain text, markdown, and structured content. Chunking and embedding run automatically.
rag/stats
Index statistics and health. Reports document count, chunk count, index freshness, and per-index breakdown.
Stats
500+ documents | 12,000+ chunks | 4 indexes | <100ms P95 latency