agents
agents copied to clipboard
Build real-time multimodal AI applications 🤖🎙️📹
While only supported for turbo v2.5, specifying language can be a helpful constraint. We currently overwrite your eleven labs tts to do this, upstreaming it here.
Hello team, I'm the maintainer of [Anteon](https://github.com/getanteon/anteon). We have created Gurubase.io with the mission of building a centralized, open-source tool-focused knowledge base. Essentially, each "guru" is equipped with custom knowledge...
I want to use Azure STT with passing a list of languages that the input audio could be from. I want Azure to identify the language from the provided list...
[Whizper API](https://fal.ai/models/fal-ai/wizper/api#queue-submit) has a fairly low word-error rate: https://artificialanalysis.ai/speech-to-text We'd like to have a plugin for it (if it doesn't work with OpenAI's STT plugin)
**Problem Description:** I have an agent process that currently uses around 50MB of resident memory (RES). I need to scale this system to handle 1,000 rooms simultaneously, and if each...
First, thanks for creating such an excellent tool! I was wondering if there are any plans to add support for speaker diarization when using plugins (e.g., [Google](https://cloud.google.com/speech-to-text/docs/multiple-voices), [Deepgram](https://developers.deepgram.com/docs/diarization), [Azure](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/get-started-stt-diarization?tabs=macos)). This...
It's desirable to store the entire conversation state from the realtime model. i.e. when disconnecting or when resuming a conversation. we should make it easy to access this on the...
When I don't add a name in the worker options, the agent joins in the room with no problems. But if I add the agent_name in the `WorkerOptions`, then the...
### Overview This PR introduces a new feature to control the automatic linking of participants to the MultimodalAgent when they join a room or when the agent starts. By default,...
Hi, I'm trying to use Ollama for local inference but I cant seem to get the code point to the correct endpoint... I'm using : ``` assistant = VoicePipelineAgent( vad=ctx.proc.userdata["vad"],...