Abhinav Balla
Abhinav Balla

RecipeAgent - Real-time Voice AI (Gordon Ramsay)

A real-time voice agent that talks like Gordon Ramsay, answers cookbook questions using RAG, and performs measurement conversions

Project Overview

RecipeAgent is a sophisticated real-time voice AI system that embodies the personality of Gordon Ramsay while providing intelligent cooking assistance. The system combines advanced AI technologies including speech-to-text, large language models, text-to-speech, and Retrieval-Augmented Generation (RAG) to create an interactive cooking companion that can answer questions from a cookbook PDF and perform measurement conversions.

Key Features

  • Real-time Voice Interaction: Live audio processing with speech-to-text and text-to-speech capabilities
  • Gordon Ramsay Persona: AI agent programmed with Gordon Ramsay's distinctive personality and communication style
  • RAG-powered Cookbook Assistant: Answers cooking questions by retrieving relevant information from a local PDF cookbook
  • Measurement Conversion Tool: Converts between different cooking measurements and units
  • LiveKit Cloud Integration: Real-time media processing and job dispatch
  • React Frontend: Modern web interface with live transcription display

Technical Architecture

The system is built with a microservices architecture featuring:

  • Python Agent (LiveKit Agents): Core AI logic, persona management, tools, RAG, STT/TTS, and Voice Activity Detection
  • Token Server (FastAPI): Secure JWT token issuance for frontend authentication
  • React Frontend: User interface with real-time audio and transcription display
  • LiveKit Cloud: Media SFU and job dispatch system
  • LlamaIndex RAG: Document processing and vector search for cookbook queries

Technologies Used

Python LiveKit Agents React FastAPI LlamaIndex OpenAI GPT-4 Deepgram STT Cartesia TTS Vector Search RAG

RAG Implementation

The system uses LlamaIndex for document processing and retrieval:

  • Processes PDF cookbooks using SimpleDirectoryReader
  • Uses OpenAI text-embedding-3-small for vector embeddings
  • Implements SentenceSplitter with chunk_size=1024 and chunk_overlap=240
  • Persists VectorStoreIndex locally with versioning to prevent data mismatches
  • Provides context-aware responses with page citations when available

Workflow

1. User clicks "Start Call" in the frontend → token server issues JWT → frontend joins LiveKit room
2. LiveKit Cloud dispatches job to Python agent worker
3. Agent performs STT (Deepgram) → LLM reasoning (OpenAI) → optional tools → TTS (Cartesia/OpenAI)
4. Agent publishes audio and synchronized transcriptions to frontend
5. Frontend displays live transcripts and plays audio responses

Tools & Capabilities

  • query_cookbook(question): RAG-powered cookbook question answering with context retrieval
  • convert_measurements: Unit conversion tool with basic density assumptions (1 g/ml)
  • Voice Activity Detection: Silero VAD for turn detection and audio processing
  • Live Transcription: Real-time text streams with topic 'lk.transcription'