FastAPIMongoDBPineconeGeminiNext.js

RAG engine for secure document intelligence

Query your files securely with grounded, traceable source citations.

ENV: CONFIGUREDSECURE: ENABLEDCITED RESPONSES
trace_console::session_4b92x
READY

RAG Pipeline Trace

INPUT QUERY

> What is our log retention period?

[10:33:41.02] EMBEDDING_GENembeddings generator [~40ms]
[10:33:41.07] VECTOR_MATCHvector similarity [~80ms]
[10:33:41.15] CONTEXT_BUILDcontext compiled [~5ms]
[10:33:41.16] LLM_RESPONSEcontext inference [~700ms]
CITED DOCUMENT SEGMENT

"...Log data is retained for 90 days. After this retention period, audit logs are permanently deleted..."

source: policy_sample.pdfrelevance: high

Ingestion & Retrieval Pipeline

Every document is split, embedded, and indexed with user isolation. Access your database through raw REST endpoints.

Pipeline Stages

01.Upload

Multipart binary upload stream

FastAPI handles PDF, DOCX, or TXT file buffer ingestion

02.Parse

Text extraction & layout parsing

Extract metadata, pages, structural headers, and layouts

03.Chunk

Recursive character chunking

500-token sliding windows with 10% semantic token overlap

04.Embed

Vector representation

Generate 1024-dimensional embeddings via Gemini API

05.Retrieve

Namespace vector search

Pinecone top-k cosine similarity filtered by document scope

06.Generate

Context-grounded LLM inference

LLM parses query grounded strictly on matching document chunks

07.Citations

Source verification

Verify token offsets and map citations directly to source pages

query_rag.py
Python 3.11
import httpx

client = httpx.Client(base_url="https://api.simplify.ai/v1")

# Scope vector search to document namespaces
response = client.post(
    "/chat/query",
    headers={"Authorization": "Bearer sk_live_9a2f"},
    json={
        "query": "What is the log retention period?",
        "document_ids": ["doc_policy_v4"],
        "response_mode": "rag_mode",
        "parameters": {
            "temperature": 0.2,
            "max_tokens": 1024
        }
    }
)

payload = response.json()
print(f"Answer: {payload['content']}")
print(f"Sources: {len(payload['citations'])} cited.")
Default Chunk Size500 tokens
Overlap Margin50 tokens
Database ClientMotor (MongoDB Async)
Auth MiddlewareHMAC-SHA256 JWT Rotation

Console Interface

Monitor vector spaces, verify source citations, and manage isolated document namespaces.

env: demo
model: configuredRAG mode
> summarize key metrics from the board deck example
RAG Response Pipeline (Grounded)

Based on the uploaded board_deck_example.pdf:

  • QoQ revenue expansion reached 18%, bringing consolidated revenue to $12.4M.
  • Enterprise RAG contract expansions accounted for 72% of new recurring growth.
  • Operational hosting overhead decreased by 14% via vector similarity cache optimization.
CITATIONS PREVIEW (2)
1. board_deck_example.pdf [document excerpt]High relevance

"...consolidated Q4 revenue reached $12.4M, representing an 18% growth quarter-over-quarter..."

2. board_deck_example.pdf [document excerpt]High relevance

"...recurring growth expansion vectors were heavily anchored in enterprise client accounts..."

> Ask RAG pipeline...
Representative Dataset148k chunks
Storage Footprint~40 MB
Typical RAG Latency~150ms
Cache EfficiencyIllustrative