agents icon indicating copy to clipboard operation
agents copied to clipboard

Quickstart on the doc doesn't work. "Waiting for audio track" forever

Open takan1 opened this issue 1 year ago • 7 comments

I was trying to learn the agent but even the quickstart on the documentation doesn't work. I have properly set the steps and also set up deepgram api key. The agent playground works and I was able to join the room and Agent connected true but status is "starting" forever and in audio section, "Waiting for audio track" forever. I'm using Mac M3 with chrome. Please help me out. Thanks in advance.

takan1 avatar May 10 '24 01:05 takan1

We will soon be introducing an update where you can register track callbacks before the room is connected. There is currently a race condition where you could miss track subscription events. The race still exists in: https://github.com/livekit/agents/blob/main/examples/speech-to-text/agent.py but it's greatly minimized.

Could you try that with the latest packages?

keepingitneil avatar May 14 '24 20:05 keepingitneil

Edit 1

From my testing, it seems like the issue is coming from text-to-speech, not speech-to-text.

   # This line is not actually sending back any audio to the room.
    await assistant.say("Hey, how can I help you today?", allow_interruptions=False)

The "Starting" status from OP seems like a red-herring. I think the hosted demo example is working fine even with the status being incorrect.

original comment

+1 I'm in the exact same spot. @keepingitneil I will try your attached example, but would recommend making sure the example version on the official site is working: https://docs.livekit.io/agents/quickstart/

technoligest avatar May 16 '24 00:05 technoligest

we've updated the quickstart guide to reflect a better demo - an end-to-end voice assistant. Give it a shot!

davidzhao avatar May 16 '24 04:05 davidzhao

@davidzhao was it updated in the last few hours? My code is identical to this guide and I'm having issues.

technoligest avatar May 16 '24 04:05 technoligest

I think I'm having the same problem, on M2.

Running livekit-server in dev-mode from brew. Running the voice-assistant example in Docker using PodMan. Generating a token, and then visiting the playground to connect to my local instance (ws://localhost:7880 or http://localhost:7880).

The statusbar says the agent actually never connects, so I'm suspecting it hangs:

image

The last message from the Python agent runner mentions the following:

{"asctime": "2024-05-16 21:33:03,200", "level": "WARNING", "name": "livekit.agents", "message": "Running <Task pending name='Task-23' coro=<entrypoint() running at /home/appuser/function_calling.py:104> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[_start.._start_if_valid..log_exception() at /home/appuser/.local/lib/python3.11/site-packages/livekit/agents/ipc/job_main.py:99]> took too long: 2.43 seconds", "job_id": "AJ_BTaWgTXgvU2R", "pid": 7}

Of which the 3 offending lines around line # 104 are:

image

If I comment out line 105, the assistent.say() and recompile the container:

image

Tails avatar May 16 '24 21:05 Tails

Ok, I found out that the UI will render the message that the "Agent is starting" for as long as there is no initial response yet from it. I had to start speaking even during the loading state to trigger a request to the LLM API, after which I found out that connection wasn't working. The gpt-4-turbo model gives a 404 when your OpenAI account has not been credited with $ yet. When I switched to use the model gpt-3.5-turbo it made some progress.

Tails avatar May 17 '24 07:05 Tails

I've removed agent_status from the Playground UI here: https://github.com/livekit/agents-playground/pull/58 because it causes confusion.

The above issues, like @Tails found, are likely due to elevenlabs, openai, or deepgram api keys/credits.

@technoligest / @takan1 looking at the elevenlabs docs, it seems like PCM audio needs pro tier or above (https://elevenlabs.io/docs/api-reference/streaming) which changed since we implemented it (https://web.archive.org/web/20231211083946/https://elevenlabs.io/docs/api-reference/streaming)

On our end we'll look into defaulting to the mp3 streaming output formats.

keepingitneil avatar May 20 '24 21:05 keepingitneil