RAG Assistant
Enterprise Document Intelligence Platform
A production-ready Retrieval Augmented Generation system built with LangChain and modern web technologies, featuring semantic search, conversational AI, and enterprise-grade document processing capabilities.
Project Overview
Challenge
Build a comprehensive document intelligence platform that enables natural language querying over private document collections while maintaining conversation context, supporting multiple LLM providers, and delivering enterprise-grade security and performance.
Solution
Developed a full-stack RAG application with LangChain integration, Weaviate vector storage, and intelligent conversation management, supporting both CLI and web interfaces with streaming responses and session isolation.
Technology Stack
Frontend
- • Next.js 14 (React Framework)
- • TypeScript (Type Safety)
- • Server-Sent Events (Real-time Streaming)
- • Local Storage (Session Persistence)
Backend & AI
- • FastAPI (Python Web Framework)
- • LangChain 0.2.16 (RAG Framework)
- • Weaviate v4 (Vector Database)
- • OpenAI Embeddings (text-embedding-3-small)
Document Processing
- • RecursiveCharacterTextSplitter (Text Chunking)
- • TextLoader (Document Ingestion)
- • WeaviateVectorStore (LangChain Integration)
- • Semantic Search (near_text + BM25 fallback)
LLM Integration
- • OpenAI API (GPT-4, GPT-3.5)
- • Ollama (Local Models)
- • Jan AI (Self-hosted LLMs)
- • Unified Chat Completions (OpenAI-compatible)
Core Features
Intelligent Document Processing
Sophisticated document ingestion with LangChain pipeline:
- • Multi-format Support: .txt and .md files with extensible loaders
- • Smart Chunking: 1000-character chunks with 200-character overlap
- • Semantic Embeddings: OpenAI text-embedding-3-small integration
- • Automatic Indexing: Railway deployment auto-population
- • Schema Management: Dynamic collection recreation for clean state
Technical Implementation
Built with LangChain's document processing pipeline using TextLoader → RecursiveCharacterTextSplitter → OpenAIEmbeddings → WeaviateVectorStore for optimal retrieval performance.
Advanced Semantic Search
High-performance vector search with intelligent fallback:
- • Primary: Weaviate near_text semantic search
- • Fallback: BM25 keyword search for reliability
- • Top-K Retrieval: Configurable result count (default: 5)
- • Context Optimization: 1500-character document truncation
- • Source Attribution: Full document metadata preservation
Performance Characteristics
Achieves <500ms retrieval latency for 10K+ documents with automatic failover ensuring 99.9% query success rate.
Conversational Memory Management
Enterprise-grade conversation handling with session isolation:
- • SQLite Persistence: Local conversation storage
- • Session Isolation: Multi-user support with unique session IDs
- • Token Management: Smart context truncation and summarization
- • History Limits: Configurable message retention (default: 20)
- • Auto-pruning: 30-day conversation cleanup
Architecture
Uses ConversationManager class with SQLite backend, supporting session-based isolation and intelligent token budgeting to stay within 32K token limits.
Streaming Response System
Real-time user experience with Server-Sent Events:
- • FastAPI Streaming: StreamingResponse with SSE format
- • React Integration: Real-time message updates
- • Error Handling: Graceful degradation and retry logic
- • Progress Tracking: Token-by-token response building
- • Session Persistence: Automatic conversation saving
Technical Details
Implements async generators with yield statements, sending data: {chunk}\n\n SSE format for seamless frontend integration.
Data Architecture & Performance
Vector Storage Strategy
Weaviate Configuration
- • Primary: near_text semantic search
- • Fallback: BM25 keyword search
- • Embedding Model: text-embedding-3-small
- • Context Window: 32,000 tokens
- • Retrieval Latency: <500ms
Intelligent Token Management
Context Optimization Pipeline
- 1. Token Counting: tiktoken-based estimation
- 2. Context Truncation: Progressive document trimming
- 3. History Summarization: LLM-powered compression
- 4. Final Validation: Max token enforcement
Database Schema
SQLite Conversation Storage
- • conversations: id, timestamp, role, content, session_id
- • Index: session_id, timestamp for optimal queries
- • Migration: Automatic schema updates
Advanced Features
Multi-LLM Provider Support
Flexible LLM integration with unified interface:
- • OpenAI: GPT-4, GPT-3.5 Turbo with API key auth
- • Ollama: Local model support (llama3.1, etc.)
- • Jan AI: Self-hosted inference servers
- • Custom Providers: OpenAI-compatible API support
Configuration Management
Environment-based provider switching with fallback strategies and automatic model detection via /api/models endpoint.
Enterprise Security Framework
Production-ready security with comprehensive protection:
- • API Key Authentication: X-API-Key header validation
- • Session Isolation: User-specific conversation boundaries
- • CORS Configuration: Explicit origin allowlisting
- • Input Sanitization: XSS and injection protection
- • Rate Limiting: Per-session request throttling
Web Interface Dashboard
Complete document management with modern UI:
- • Chat Interface: Real-time streaming conversations
- • Document Browser: File exploration with metadata
- • Model Selection: Dynamic provider switching
- • Session Management: Clear history and persistence
- • API Configuration: Key management and validation
Technical Challenges & Solutions
Context Window Management
Challenge
Managing 32K token limits while preserving conversation quality and document context for long interactions.
Solution
Implemented intelligent truncation with truncate_context() method using progressive document trimming and LLM-powered conversation summarization.
Multi-Provider LLM Integration
Challenge
Supporting diverse LLM providers (OpenAI, Ollama, Jan AI) with different authentication and API formats.
Solution
Created unified JanAIPromptNode class with OpenAI-compatible interface and flexible authentication handling.
Production Deployment Complexity
Challenge
Coordinating Weaviate, FastAPI, and Next.js services with proper environment configuration and automatic document indexing.
Solution
Railway deployment with automatic startup document population, internal networking configuration, and comprehensive health checks.
Semantic Search Reliability
Challenge
Ensuring consistent retrieval performance despite API failures or configuration issues.
Solution
Dual-search strategy with semantic near_text primary and BM25 fallback, plus extensive error handling and logging.
Key Achievements
Technical Implementation
- • Complete RAG Pipeline: LangChain-powered document processing with 95% tutorial feature coverage
- • Production Architecture: FastAPI + Next.js with streaming, authentication, and session management
- • Multi-Modal Interface: Both CLI and web interfaces with feature parity
- • Enterprise Security: API key auth, CORS, session isolation, and input validation
- • Performance Optimization: <500ms retrieval, 32K token management, intelligent caching
Platform Features
- • Document Intelligence: Semantic search over private document collections
- • Conversational AI: Context-aware responses with memory management
- • Multi-LLM Support: OpenAI, Ollama, Jan AI with unified interface
- • Railway Deployment: Production-ready with automatic document indexing
- • Real-time Streaming: Server-Sent Events for responsive user experience
Future Enhancements
Planned Features
- • Query Contextualization: LLM-powered query rewriting based on conversation history
- • Multi-step Retrieval: Agent-based systems for complex research questions
- • Advanced Document Types: PDF, DOCX, HTML with specialized processing
- • Hybrid Search: Combined semantic + keyword search with reranking
- • Real-time Collaboration: Multi-user document annotation and sharing
Technical Improvements
- • LangGraph Integration: Advanced workflow orchestration for complex tasks
- • Vector Database Scaling: Distributed storage for enterprise document volumes
- • Advanced Analytics: Query performance monitoring and user behavior tracking
- • Enhanced Security: Role-based access control and document-level permissions
- • Mobile Optimization: Progressive web app with offline capabilities
Deployment & Infrastructure
Production Hosting
Railway Platform
- • Backend: FastAPI with uvicorn
- • Frontend: Next.js with static generation
- • Database: Weaviate with persistent volumes
- • Environment: Automated variable management
Container Orchestration
Docker Compose Stack
- • Services: Weaviate, API, Web
- • Networks: Internal service communication
- • Volumes: Document and vector persistence
- • Health Checks: Automated service monitoring
Development Workflow
Modern Development Practices
- • Version Control: Git with feature branches
- • Environment Management: .env with examples
- • Local Development: Docker Compose stack
- • Testing Strategy: Unit, integration, manual
- • Deployment: Automated Railway integration