Skip to content

AI Chat UI: A Learning Guide

🎯 Overview

This guide explains how our FastAPI-based FastOpp application implements a ChatGPT/Claude-like AI chat interface with advanced features like streaming responses, internet search integration, and database connectivity. You'll learn about the technical architecture, implementation strategies, and potential enhancements for building production-ready chat applications.

FastOpp is an open source learning tool for AI applications with pre-built UI and admin components to help students spend more time on core AI learning concepts instead of configuring the full FastAPI stack from scratch with authentication and UI components. Learn more at https://github.com/Oppkey/fastopp.

🧠 Core Chat Functionality

1. Streaming Responses for Interactivity

Why Streaming? - Instead of waiting for complete responses, streaming provides real-time feedback - Makes the chat feel more interactive and responsive - Reduces perceived latency and improves user experience

Technical Implementation: - Uses FastAPI's EventSourceResponse with OpenRouter's streaming API - Sends chunks as they arrive from the LLM - Frontend processes data: events for real-time UI updates

Code Flow:

chat_with_llama_stream() → EventSourceResponse → Frontend processes data: events → Real-time UI updates

2. Chat History & Context Management

Context Window Strategy: - This app sends the last 20 messages to the LLM for context - Balances conversation continuity with token limits - Llama 3.3 70B has a 128k token context window, but we limit to 20 messages for fast responses and cost management

Smart Retrieval: - Uses get_conversation_context() to fetch recent messages in chronological order - Maintains conversation flow while managing token usage - Database design: Messages stored with conversation_id, role, content, and created_at

3. Internet Search Integration

Current Implementation: Keyword-Based Triggering - Automatically detects when users ask about current events using keywords - Keywords: "latest", "current", "2024", "news", "price", "weather", etc. - Uses Tavily API for structured, LLM-optimized search results

Alternative Search Triggering Strategies:

Strategy 1: Intent Classification

# Use a lightweight ML model to classify user intent
def classify_search_intent(user_message: str) -> bool:
    # Train a model to detect when users need current information
    # More sophisticated than keyword matching
    pass

Strategy 2: LLM-Based Decision Making

# Ask the LLM itself if it needs current information
def should_search_web(user_message: str, llm_response: str) -> bool:
    # LLM analyzes if it needs current data to answer properly
    # More context-aware than keyword detection
    pass

Strategy 3: Hybrid Approach

# Combine multiple signals
def should_search(user_message: str, conversation_history: list) -> bool:
    keyword_match = any(keyword in user_message.lower() for keyword in SEARCH_KEYWORDS)
    intent_score = classify_intent(user_message)
    llm_confidence = get_llm_confidence(user_message)

    return keyword_match or (intent_score > 0.7) or (llm_confidence < 0.5)
  • Add a "Search Web" button to the UI
  • Let users explicitly request web search
  • Most reliable but requires user interaction

Strategy 5: Time-Based Triggers

  • Automatically search for topics that are likely to change over time
  • Use embeddings to detect when users ask about "current" topics
  • More sophisticated than simple keyword matching

🔍 Advanced Database Integration

1. Hybrid Search Integration

Current Implementation: - Vector database (FAISS) + Full-text search (FTS5) + BM25 ranking - Available in the search page for document retrieval - Can be integrated into chat for context-aware responses

Integration Strategies:

Strategy A: Automatic Context Retrieval

def get_relevant_context(user_message: str, conversation_id: str) -> str:
    # Search for relevant documents/conversations
    search_results = hybrid_search_service.search(user_message, limit=5)

    # Format results for LLM context
    context = format_search_results_for_llm(search_results)
    return context
  • Add a "Search Knowledge Base" button
  • Let users explicitly search their data
  • Most reliable but requires user interaction

Strategy C: Smart Context Injection

def smart_context_injection(user_message: str, conversation_history: list) -> str:
    # Analyze if user is asking about specific topics
    if is_asking_about_customers(user_message):
        return get_customer_context(user_message)
    elif is_asking_about_projects(user_message):
        return get_project_context(user_message)
    # ... other domain-specific contexts

2. Customer Database Integration

Use Cases: - "Find customers similar to Acme Corp" - "What products do our enterprise clients use?" - "Show me customers who haven't been contacted in 6 months"

Implementation Approach: 1. Data Preparation: Convert customer data to embeddings 2. Search Integration: Use hybrid search to find relevant customers 3. Context Formatting: Format results for LLM consumption 4. Response Enhancement: LLM provides human-like responses with data

Example Implementation:

class CustomerContextService:
    def get_customer_context(self, query: str) -> str:
        # Search customer database using hybrid search
        results = self.hybrid_search.search(query, content_type="customer")

        # Format for LLM
        context = "Relevant customers:\n"
        for customer in results:
            context += f"- {customer.name}: {customer.summary}\n"

        return context

3. Document Integration

Document Types to Support: - PDFs (contracts, reports, manuals) - Word documents (proposals, specifications) - Text files (notes, procedures) - Web pages (company knowledge base)

Implementation Steps:

Step 1: Document Processing Pipeline

class DocumentProcessor:
    def process_document(self, file_path: str) -> List[Chunk]:
        # Extract text from document
        text = extract_text(file_path)

        # Split into chunks
        chunks = chunk_text(text)

        # Generate embeddings
        embeddings = generate_embeddings(chunks)

        # Store in database
        store_chunks(chunks, embeddings)

Step 2: Document Search Integration

def search_documents(user_message: str) -> str:
    # Use hybrid search to find relevant documents
    results = hybrid_search.search(user_message, content_type="document")

    # Format for LLM context
    context = format_document_results(results)
    return context

Step 3: Smart Document Retrieval

def get_relevant_documents(user_message: str, conversation_history: list) -> str:
    # Analyze conversation context
    context_keywords = extract_keywords(conversation_history)

    # Search with context
    search_query = f"{user_message} {context_keywords}"
    results = search_documents(search_query)

    return results

🛠️ Learning Path: Building Advanced Chat Features

Phase 1: Core Functionality

  1. Study streaming implementation (services/chat_service.py)
  2. Understand context management (services/chat_history_service.py)
  3. Test different search triggering strategies

Phase 2: Database Integration

  1. Integrate hybrid search into chat responses
  2. Add customer database connectivity
  3. Implement document processing pipeline

Phase 3: Advanced Features

  1. Add intent classification for smarter search triggering
  2. Implement multi-modal document support
  3. Create domain-specific context injection

Phase 4: Production Features

  1. Add conversation analytics and user behavior tracking
  2. Implement conversation summarization
  3. Add conversation export and sharing

🏗️ Architecture Deep Dive

Current Architecture

User Input → Chat Service → OpenRouter API → Streaming Response
        Chat History Service → Database Storage
        Web Search Service → Tavily API (when triggered)

Enhanced Architecture

User Input → Chat Service → OpenRouter API → Streaming Response
        Chat History Service → Database Storage
        Hybrid Search Service → Vector DB + FTS5
        Document Service → Document Processing
        Customer Service → Customer Database

Key Services

  • ChatService: Core chat functionality and LLM integration
  • WebSearchService: Internet search with smart triggering
  • ChatHistoryService: Conversation management and context
  • HybridSearchService: Vector and text search capabilities
  • DocumentService: Document processing and retrieval
  • CustomerService: Customer data integration

📊 Implementation Strategies

1. Gradual Enhancement Approach

  • Start with existing functionality
  • Add one feature at a time
  • Test and validate each addition
  • Build on successful patterns

2. User-Centric Design

  • Focus on user experience
  • Make features discoverable
  • Provide clear feedback
  • Allow user control

3. Performance Considerations

  • Cache frequently accessed data
  • Optimize database queries
  • Use async/await patterns
  • Implement proper error handling

🎯 Next Steps for Learning

  1. Experiment with different search triggering strategies
  2. Integrate hybrid search into chat responses
  3. Add document processing capabilities
  4. Implement customer database connectivity
  5. Test and optimize performance
  6. Deploy to production environment

💡 Advanced Concepts

1. Multi-Agent Systems

  • Different agents for different tasks
  • Specialized search agents
  • Document analysis agents
  • Customer service agents

2. Context-Aware Responses

  • Understand conversation history
  • Maintain user preferences
  • Adapt to user behavior
  • Provide personalized responses

3. Real-Time Collaboration

  • Multiple users in same conversation
  • Real-time updates
  • Conflict resolution
  • Shared context

Remember: The best way to learn is by building! Start with small enhancements and gradually add more sophisticated features.