DeepSeek-R1-Voice-Agent icon indicating copy to clipboard operation
DeepSeek-R1-Voice-Agent copied to clipboard

An interactive AI voice agent that can capture and transcribe speech in real-time, generate intelligent responses using the DeepSeek R1 (7B model) AI, and convert the responses back to natural speech...

DeepSeek R1 AI Voice Agent

A real-time AI voice assistant powered by DeepSeek R1 that enables seamless voice conversations through speech-to-text transcription, AI response generation, and text-to-speech synthesis.

🌟 Overview

This project creates an interactive AI voice agent that:

  • Captures and transcribes speech in real-time using AssemblyAI
  • Generates intelligent responses using DeepSeek R1 (7B model) via Ollama
  • Converts AI responses back to natural speech using ElevenLabs
  • Streams audio responses for immediate playback

✨ Features

  • Real-time Speech Recognition: High-quality speech-to-text transcription with AssemblyAI
  • Advanced AI Responses: Powered by DeepSeek R1's reasoning capabilities
  • Natural Voice Synthesis: Professional text-to-speech with ElevenLabs
  • Streaming Audio Playback: Low-latency audio streaming for responsive conversations
  • Conversation Memory: Maintains context throughout the conversation
  • Cross-platform Support: Works on macOS, Linux, and Windows

πŸ”§ Prerequisites

API Keys Required

System Dependencies

Install Ollama

Download and install Ollama from ollama.com

Install PortAudio

Ubuntu/Debian:

sudo apt update && sudo apt install portaudio19-dev

macOS:

brew install portaudio

Windows: PortAudio is typically included with the Python package installation.

Install MPV (macOS only)

brew install mpv

πŸ“¦ Installation

1. Clone the Repository

git clone https://github.com/danieladdisonorg/DeepSeek-R1-Voice-Agent.git
cd DeepSeek-R1-Voice-Agent

2. Install Python Dependencies

pip install "assemblyai[extras]" ollama elevenlabs

3. Download DeepSeek R1 Model

ollama pull deepseek-r1:7b

4. Configure API Keys

Edit AIVoiceAgent.py and replace the placeholder API keys:

aai.settings.api_key = "YOUR_ASSEMBLYAI_API_KEY"
self.client = ElevenLabs(api_key="YOUR_ELEVENLABS_API_KEY")

πŸš€ Usage

Start the Voice Agent

python AIVoiceAgent.py

Interaction Flow

  1. Speak: The agent listens for your voice input
  2. Processing: Your speech is transcribed and sent to DeepSeek R1
  3. Response: The AI generates a response (limited to 300 characters for quick interactions)
  4. Playback: The response is converted to speech and played back
  5. Continue: The conversation continues with maintained context

Stopping the Agent

Press Ctrl+C to stop the voice agent.

βš™οΈ Configuration

Model Settings

  • AI Model: DeepSeek R1 7B (configurable in the code)
  • Voice Model: ElevenLabs Turbo v2 (configurable)
  • Response Length: Limited to 300 characters (adjustable in system prompt)
  • Sample Rate: 16kHz for optimal quality

Customization Options

  • Modify the system prompt in AIVoiceAgent.py to change AI behavior
  • Adjust response length limits
  • Change voice models in ElevenLabs configuration
  • Modify audio streaming parameters

πŸ” Troubleshooting

Common Issues

"No module named 'assemblyai'"

pip install "assemblyai[extras]"

"Ollama connection error"

  • Ensure Ollama is running: ollama serve
  • Verify the model is downloaded: ollama list

"Audio device not found"

  • Check microphone permissions
  • Verify PortAudio installation
  • Test microphone with other applications

"ElevenLabs API error"

  • Verify API key is correct
  • Check API quota/usage limits
  • Ensure stable internet connection

Performance Tips

  • Use a quality microphone for better transcription accuracy
  • Ensure stable internet connection for API calls
  • Close unnecessary applications to free up system resources

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Microphone    │───▢│  AssemblyAI  │───▢│   DeepSeek R1   β”‚
β”‚   (Audio Input) β”‚    β”‚ (Speech-to-  β”‚    β”‚ (AI Response    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚  Text)       β”‚    β”‚  Generation)    β”‚
                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                      β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”‚
β”‚   Speakers      │◀───│  ElevenLabs  β”‚β—€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ (Audio Output)  β”‚    β”‚ (Text-to-    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚  Speech)     β”‚
                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“„ License

This project is open source. Please check the repository for license details.

🀝 Contributing

Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.

πŸ“ž Support

For issues and questions:

  • Open an issue on GitHub
  • Check the troubleshooting section above
  • Review API documentation for AssemblyAI, Ollama, and ElevenLabs

Note: This project requires active internet connection for API services and sufficient system resources to run the DeepSeek R1 model locally via Ollama.