Project Overview
RecipeAgent is a sophisticated real-time voice AI system that embodies the personality of Gordon Ramsay while providing intelligent cooking assistance. The system combines advanced AI technologies including speech-to-text, large language models, text-to-speech, and Retrieval-Augmented Generation (RAG) to create an interactive cooking companion that can answer questions from a cookbook PDF and perform measurement conversions.
Key Features
- Real-time Voice Interaction: Live audio processing with speech-to-text and text-to-speech capabilities
- Gordon Ramsay Persona: AI agent programmed with Gordon Ramsay's distinctive personality and communication style
- RAG-powered Cookbook Assistant: Answers cooking questions by retrieving relevant information from a local PDF cookbook
- Measurement Conversion Tool: Converts between different cooking measurements and units
- LiveKit Cloud Integration: Real-time media processing and job dispatch
- React Frontend: Modern web interface with live transcription display
Technical Architecture
The system is built with a microservices architecture featuring:
- Python Agent (LiveKit Agents): Core AI logic, persona management, tools, RAG, STT/TTS, and Voice Activity Detection
- Token Server (FastAPI): Secure JWT token issuance for frontend authentication
- React Frontend: User interface with real-time audio and transcription display
- LiveKit Cloud: Media SFU and job dispatch system
- LlamaIndex RAG: Document processing and vector search for cookbook queries
Technologies Used
RAG Implementation
The system uses LlamaIndex for document processing and retrieval:
- Processes PDF cookbooks using SimpleDirectoryReader
- Uses OpenAI text-embedding-3-small for vector embeddings
- Implements SentenceSplitter with chunk_size=1024 and chunk_overlap=240
- Persists VectorStoreIndex locally with versioning to prevent data mismatches
- Provides context-aware responses with page citations when available
Workflow
1. User clicks "Start Call" in the frontend → token server issues JWT → frontend joins LiveKit room
2. LiveKit Cloud dispatches job to Python agent worker
3. Agent performs STT (Deepgram) → LLM reasoning (OpenAI) → optional tools → TTS (Cartesia/OpenAI)
4. Agent publishes audio and synchronized transcriptions to frontend
5. Frontend displays live transcripts and plays audio responses
Tools & Capabilities
- query_cookbook(question): RAG-powered cookbook question answering with context retrieval
- convert_measurements: Unit conversion tool with basic density assumptions (1 g/ml)
- Voice Activity Detection: Silero VAD for turn detection and audio processing
- Live Transcription: Real-time text streams with topic 'lk.transcription'