agents
agents copied to clipboard
Quickstart on the doc doesn't work. "Waiting for audio track" forever
I was trying to learn the agent but even the quickstart on the documentation doesn't work. I have properly set the steps and also set up deepgram api key. The agent playground works and I was able to join the room and Agent connected true but status is "starting" forever and in audio section, "Waiting for audio track" forever. I'm using Mac M3 with chrome. Please help me out. Thanks in advance.
We will soon be introducing an update where you can register track callbacks before the room is connected. There is currently a race condition where you could miss track subscription events. The race still exists in: https://github.com/livekit/agents/blob/main/examples/speech-to-text/agent.py but it's greatly minimized.
Could you try that with the latest packages?
Edit 1
From my testing, it seems like the issue is coming from text-to-speech, not speech-to-text.
# This line is not actually sending back any audio to the room.
await assistant.say("Hey, how can I help you today?", allow_interruptions=False)
The "Starting" status from OP seems like a red-herring. I think the hosted demo example is working fine even with the status being incorrect.
original comment
+1 I'm in the exact same spot. @keepingitneil I will try your attached example, but would recommend making sure the example version on the official site is working: https://docs.livekit.io/agents/quickstart/
we've updated the quickstart guide to reflect a better demo - an end-to-end voice assistant. Give it a shot!
@davidzhao was it updated in the last few hours? My code is identical to this guide and I'm having issues.
I think I'm having the same problem, on M2.
Running livekit-server in dev-mode from brew. Running the voice-assistant example in Docker using PodMan. Generating a token, and then visiting the playground to connect to my local instance (ws://localhost:7880 or http://localhost:7880).
The statusbar says the agent actually never connects, so I'm suspecting it hangs:
The last message from the Python agent runner mentions the following:
{"asctime": "2024-05-16 21:33:03,200", "level": "WARNING", "name": "livekit.agents", "message": "Running <Task pending name='Task-23' coro=<entrypoint() running at /home/appuser/function_calling.py:104> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[_start.
._start_if_valid. .log_exception() at /home/appuser/.local/lib/python3.11/site-packages/livekit/agents/ipc/job_main.py:99]> took too long: 2.43 seconds", "job_id": "AJ_BTaWgTXgvU2R", "pid": 7}
Of which the 3 offending lines around line # 104 are:
If I comment out line 105, the assistent.say() and recompile the container:
Ok, I found out that the UI will render the message that the "Agent is starting" for as long as there is no initial response yet from it. I had to start speaking even during the loading state to trigger a request to the LLM API, after which I found out that connection wasn't working. The gpt-4-turbo model gives a 404 when your OpenAI account has not been credited with $ yet. When I switched to use the model gpt-3.5-turbo it made some progress.
I've removed agent_status from the Playground UI here: https://github.com/livekit/agents-playground/pull/58 because it causes confusion.
The above issues, like @Tails found, are likely due to elevenlabs, openai, or deepgram api keys/credits.
@technoligest / @takan1 looking at the elevenlabs docs, it seems like PCM audio needs pro tier or above (https://elevenlabs.io/docs/api-reference/streaming) which changed since we implemented it (https://web.archive.org/web/20231211083946/https://elevenlabs.io/docs/api-reference/streaming)
On our end we'll look into defaulting to the mp3 streaming output formats.