agents icon indicating copy to clipboard operation
agents copied to clipboard

azure tts or stt : WebSocket upgrade failed (with westeurope endpoints)

Open remisharrock opened this issue 8 months ago • 5 comments

on windows I used the latest versions of the packages (with >=0 ) installed on a clean fresh env

pip install "livekit-agents[azure,openai,silero,turn-detector]>=0" "python-dotenv"

I got the latest versions (from today), 1.0.16 , for example:

pip show livekit-plugins-azure
Name: livekit-plugins-azure
Version: 1.0.16
...
pip show livekit-agents
Name: livekit-agents
Version: 1.0.16

with a simple .env:

OPENAI_API_KEY=XXXX
LIVEKIT_URL=wss://XXXX.livekit.cloud
LIVEKIT_API_KEY=XXXX
LIVEKIT_API_SECRET=XXXX
AZURE_SPEECH_KEY=XXXX
AZURE_SPEECH_HOST=https://westeurope.api.cognitive.microsoft.com/

then I create a simple agent:

from dotenv import load_dotenv

from livekit import agents
from livekit.agents import AgentSession, Agent, RoomInputOptions
from livekit.plugins import (
    openai,
    azure,
    silero,
)
from livekit.plugins.turn_detector.multilingual import MultilingualModel

load_dotenv()


class Assistant(Agent):
    def __init__(self) -> None:
        super().__init__(instructions="You are a helpful voice AI assistant.")


async def entrypoint(ctx: agents.JobContext):
    await ctx.connect()

    session = AgentSession(
        stt=azure.STT(),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=azure.TTS(),
        vad=silero.VAD.load(),
        turn_detection=MultilingualModel(),
    )

    await session.start(
        room=ctx.room,
        agent=Assistant(),
    )

    await session.generate_reply(
        instructions="Greet the user and offer your assistance."
    )


if __name__ == "__main__":
    agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))

and I get an WebSocket upgrade error:

python main.py console
2025-04-22 22:18:40,924 - DEBUG asyncio - Using proactor: IocpProactor 
==================================================
     Livekit Agents - Console
==================================================
Press [Ctrl+B] to toggle between Text/Audio mode, [Q] to quit.

2025-04-22 22:18:40,924 - INFO livekit.agents - starting worker {"version": "1.0.16", "rtc-version": "1.0.6"}
2025-04-22 22:18:40,924 - INFO livekit.agents - starting inference executor
2025-04-22 22:18:42,027 - INFO livekit.agents - initializing inference process {"pid": 35380, "inference": true}
2025-04-22 22:18:42,027 - DEBUG livekit.agents - initializing inference runner {"runner": "lk_end_of_utterance_multilingual", "pid": 35380, "inference": true}
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
2025-04-22 22:18:43,511 - INFO livekit.agents - inference process initialized {"pid": 35380, "inference": true}
2025-04-22 22:18:43,511 - DEBUG asyncio - Using proactor: IocpProactor {"pid": 35380, "inference": true}
2025-04-22 22:18:43,511 - INFO livekit.agents - see tracing information at http://localhost:58139/debug
2025-04-22 22:18:43,511 - INFO livekit.agents - initializing job runner {"tid": 23000}
2025-04-22 22:18:43,511 - INFO livekit.agents - job runner initialized {"tid": 23000}
2025-04-22 22:18:43,511 - DEBUG asyncio - Using proactor: IocpProactor
2025-04-22 22:18:45,682 - WARNING livekit.agents - failed to synthesize speech, retrying in 0.1s {"tts": "livekit.plugins.azure.tts.TTS", "attempt": 1, "streamed": false}
Traceback (most recent call last):
  File "C:\Users\remis\neocertif\livekit-agent\.venv\Lib\site-packages\livekit\agents\tts\tts.py", line 203, in _main_task
    return await self._run()
           ^^^^^^^^^^^^^^^^^
  File "C:\Users\remis\neocertif\livekit-agent\.venv\Lib\site-packages\livekit\plugins\azure\tts.py", line 371, in _run
    raise APIConnectionError(cancel_details.error_details)
livekit.agents._exceptions.APIConnectionError: WebSocket upgrade failed: Internal service error (404). Error Details: Failed with HTTP 404 Resource Not Found
wss://westeurope.api.cognitive.microsoft.com/cognitiveservices/websocket/v1
X-ConnectionId: dd63ac2e75084906a33e18bd0dd66c8b
apim-request-id: 0dbd61fe-8e37-4541-9b62-80a93f60ee6d
Content-Type: application/json
{"error":{"code":"404","message": "Resource not found"}} Please check request details. USP state: Sending. Received audio size: 0 bytes.
2025-04-22 22:18:45,946 - WARNING livekit.agents - failed to synthesize speech, retrying in 2.0s {"tts": "livekit.plugins.azure.tts.TTS", "attempt": 2, "streamed": false}
Traceback (most recent call last):
  File "C:\Users\remis\neocertif\livekit-agent\.venv\Lib\site-packages\livekit\agents\tts\tts.py", line 203, in _main_task
    return await self._run()
           ^^^^^^^^^^^^^^^^^
  File "C:\Users\remis\neocertif\livekit-agent\.venv\Lib\site-packages\livekit\plugins\azure\tts.py", line 371, in _run
    raise APIConnectionError(cancel_details.error_details)
livekit.agents._exceptions.APIConnectionError: WebSocket upgrade failed: Internal service error (404). Error Details: Failed with HTTP 404 Resource Not Found
wss://westeurope.api.cognitive.microsoft.com/cognitiveservices/websocket/v1
X-ConnectionId: 3da5ec37a8c84504aaf10938127f5c1a
apim-request-id: 5466cce9-6db1-4932-b265-98994c98a71a
Content-Type: application/json
{"error":{"code":"404","message": "Resource not found"}} Please check request details. USP state: Sending. Received audio size: 0 bytes.
2025-04-22 22:18:46,331 - WARNING livekit.agents - failed to recognize speech, retrying in 0.1s {"tts": "livekit.plugins.azure.stt.STT", "attempt": 0, "streamed": true}
Traceback (most recent call last):
  File "C:\Users\remis\neocertif\livekit-agent\.venv\Lib\site-packages\livekit\agents\stt\stt.py", line 228, in _main_task
    return await self._run()
           ^^^^^^^^^^^^^^^^^
  File "C:\Users\remis\neocertif\livekit-agent\.venv\Lib\site-packages\livekit\plugins\azure\stt.py", line 225, in _run
    raise APIConnectionError("SpeechRecognition session stopped")
livekit.agents._exceptions.APIConnectionError: SpeechRecognition session stopped
2025-04-22 22:18:46,466 - WARNING livekit.agents - failed to recognize speech, retrying in 2.0s {"tts": "livekit.plugins.azure.stt.STT", "attempt": 1, "streamed": true}
Traceback (most recent call last):
  File "C:\Users\remis\neocertif\livekit-agent\.venv\Lib\site-packages\livekit\agents\stt\stt.py", line 228, in _main_task
    return await self._run()
           ^^^^^^^^^^^^^^^^^
  File "C:\Users\remis\neocertif\livekit-agent\.venv\Lib\site-packages\livekit\plugins\azure\stt.py", line 225, in _run
    raise APIConnectionError("SpeechRecognition session stopped")
livekit.agents._exceptions.APIConnectionError: SpeechRecognition session stopped
2025-04-22 22:18:48,114 - WARNING livekit.agents - failed to synthesize speech, retrying in 2.0s {"tts": "livekit.plugins.azure.tts.TTS", "attempt": 3, "streamed": false}
Traceback (most recent call last):
  File "C:\Users\remis\neocertif\livekit-agent\.venv\Lib\site-packages\livekit\agents\tts\tts.py", line 203, in _main_task
    return await self._run()
           ^^^^^^^^^^^^^^^^^
  File "C:\Users\remis\neocertif\livekit-agent\.venv\Lib\site-packages\livekit\plugins\azure\tts.py", line 371, in _run
    raise APIConnectionError(cancel_details.error_details)
livekit.agents._exceptions.APIConnectionError: WebSocket upgrade failed: Internal service error (404). Error Details: Failed with HTTP 404 Resource Not Found
wss://westeurope.api.cognitive.microsoft.com/cognitiveservices/websocket/v1
X-ConnectionId: b14588513fe647b0926eface78e5174c
apim-request-id: 0fb5951b-c573-49ab-9142-36e314d6bd6e
Content-Type: application/json
{"error":{"code":"404","message": "Resource not found"}} Please check request details. USP state: Sending. Received audio size: 0 bytes.
2025-04-22 22:19:01,285 - INFO livekit.agents - shutting down worker {"id": "unregistered"}
2025-04-22 22:19:01,287 - DEBUG livekit.agents - shutting down job task {"reason": "", "user_initiated": false}
2025-04-22 22:19:01,288 - DEBUG livekit.agents - http_session(): closing the httpclient ctx
2025-04-22 22:19:01,288 - DEBUG livekit.agents - http_session(): creating a new httpclient ctx
2025-04-22 22:19:01,288 - DEBUG livekit.agents - job exiting {"reason": "", "tid": 23000, "job_id": "simulated-job-ff5d146d1332"}
2025-04-22 22:19:01,288 - INFO livekit.agents - process exiting {"reason": "", "pid": 35380, "inference": true}
2025-04-22 22:19:01,523 - ERROR asyncio - Task exception was never retrieved
future: <Task finished name='TTS._synthesize_task' coro=<ChunkedStream._main_task() done, defined at C:\Users\remis\neocertif\livekit-agent\.venv\Lib\site-packages\livekit\agents\tts\tts.py:200> exception=APIConnectionError('failed to synthesize speech after 4 attempts')>
Traceback (most recent call last):
  File "C:\Users\remis\neocertif\livekit-agent\.venv\Lib\site-packages\livekit\agents\tts\tts.py", line 203, in _main_task
    return await self._run()
           ^^^^^^^^^^^^^^^^^
  File "C:\Users\remis\neocertif\livekit-agent\.venv\Lib\site-packages\livekit\plugins\azure\tts.py", line 371, in _run
    raise APIConnectionError(cancel_details.error_details)
livekit.agents._exceptions.APIConnectionError: WebSocket upgrade failed: Internal service error (404). Error Details: Failed with HTTP 404 Resource Not Found
wss://westeurope.api.cognitive.microsoft.com/cognitiveservices/websocket/v1
X-ConnectionId: fd0f0934a0914d23a4350f2cbe40a350
apim-request-id: d0b4baf6-2e7b-4543-a547-5a37214e75fb
Content-Type: application/json
{"error":{"code":"404","message": "Resource not found"}} Please check request details. USP state: Sending. Received audio size: 0 bytes.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\remis\neocertif\livekit-agent\.venv\Lib\site-packages\livekit\agents\tts\tts.py", line 211, in _main_task
    raise APIConnectionError(
livekit.agents._exceptions.APIConnectionError: failed to synthesize speech after 4 attempts

remisharrock avatar Apr 22 '25 20:04 remisharrock

@theomonnom salut (à paris aussi ?); @jayeshp19 hi; could it be related to : https://github.com/livekit/agents/pull/2008

remisharrock avatar Apr 22 '25 20:04 remisharrock

by the way I found something interesting 401 and 404!

 curl wss://westeurope.tts.speech.microsoft.com/cognitiveservices/websocket/v1
curl: (22) Refused WebSockets upgrade: 401
 curl wss://westeurope.api.cognitive.microsoft.com/cognitiveservices/websocket/v1
curl: (22) Refused WebSockets upgrade: 404

remisharrock avatar Apr 22 '25 20:04 remisharrock

It says {"error":{"code":"404","message": "Resource not found"}} Please check request details. please check if you're using correct credentials

jayeshp19 avatar Apr 23 '25 05:04 jayeshp19

dear @jayeshp19 I'm using the correct credentials (it works with many other services like livekit or ten agent etc) and indeed I think the websocket upgrade fails because the URL mentioned in the log is not the good one:

wss://westeurope.api.cognitive.microsoft.com/cognitiveservices/websocket/v1 is the one that fails with 404 in the logs

as you can see the URL starts with westeurope.api.cognitive which is the one used for HTTPS but not for WSS in the documentation bellow.

The second one that I tried with curl starts with westeurope.tts.speech and I found this one in this documentation called

"Construct endpoint URL" and it says:

Usually in SDK scenarios (and in the speech to text REST API for short audio and text to speech REST API scenarios), Speech resources use the dedicated regional endpoints for different service offerings. The DNS name format for these endpoints is:

{region}.{speech service offering}.speech.microsoft.com

An example DNS name is:

westeurope.stt.speech.microsoft.com

here is the doc

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-services-private-link?tabs=portal#construct-endpoint-url

When you say "check request details" I don't know where to check that ? I will try to find in the code where this URL is created is that what you were thinking ?

Thanks for your help

remisharrock avatar Apr 23 '25 16:04 remisharrock

and the error line says:

livekit.agents._exceptions.APIConnectionError: WebSocket upgrade failed: Internal service error (404). Error Details: Failed with HTTP 404 Resource Not Found wss://westeurope.api.cognitive.microsoft.com/cognitiveservices/websocket/v1

do you think I can try the two files in "standalone" mode ? this one https://github.com/livekit/agents/blob/main/livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/stt.py and this one https://github.com/livekit/agents/blob/main/livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py ?

to try to debug ?

remisharrock avatar Apr 23 '25 16:04 remisharrock