"opening handshake failed" for websocket endpoint
First Check
- [X] I added a very descriptive title to this issue.
- [X] I used the GitHub search to find a similar issue and didn't find it.
- [X] I searched the FastAPI documentation, with the integrated search.
- [X] I already searched in Google "How to X in FastAPI" and didn't find any information.
- [X] I already read and followed all the tutorial in the docs and didn't find an answer.
- [X] I already checked if it is not related to FastAPI but to Pydantic.
- [X] I already checked if it is not related to FastAPI but to Swagger UI.
- [X] I already checked if it is not related to FastAPI but to ReDoc.
Commit to Help
- [X] I commit to help with one of those options 👆
Example Code
-
Description
Hey!
We are using FastAPI to setup a websocket endpoint and running uvicorn workers.
Very often we see errors saying opening handshake failed together with this stacktrace:
Traceback (most recent call last):
File \"/usr/local/lib/python3.10/site-packages/websockets/legacy/server.py\", line 163, in handler
await self.handshake(
File \"/usr/local/lib/python3.10/site-packages/websockets/legacy/server.py\", line 597, in handshake
raise self.connection_closed_exc() # pragma: no cover
websockets.exceptions.ConnectionClosedError: no close frame received or sent
Right after we see this other error, saying that the await websocket.receive_json() isn't an awaitable thing, which of course it is.
Traceback (most recent call last):
File \"/events/websocket.py\", line 121, in receive_events
content: Dict[str, Any] = await websocket.receive_json()
File \"/usr/local/lib/python3.10/site-packages/starlette/websockets.py\", line 132, in receive_json
message = await self.receive()
File \"/usr/local/lib/python3.10/site-packages/starlette/websockets.py\", line 45, in receive
message = await self._receive()
File \"/usr/local/lib/python3.10/site-packages/uvicorn/protocols/websockets/websockets_impl.py\", line 336, in asgi_receive
data = await self.recv()
File \"/usr/local/lib/python3.10/site-packages/websockets/legacy/protocol.py\", line 536, in recv
await asyncio.wait(
File \"/usr/local/lib/python3.10/asyncio/tasks.py\", line 382, in wait
fs = {ensure_future(f, loop=loop) for f in fs}
File \"/usr/local/lib/python3.10/asyncio/tasks.py\", line 382, in <setcomp>
fs = {ensure_future(f, loop=loop) for f in fs}
File \"/usr/local/lib/python3.10/asyncio/tasks.py\", line 615, in ensure_future
return _ensure_future(coro_or_future, loop=loop)
File \"/usr/local/lib/python3.10/asyncio/tasks.py\", line 630, in _ensure_future
raise TypeError('An asyncio.Future, a coroutine or an awaitable '
TypeError: An asyncio.Future, a coroutine or an awaitable is required
My understanding for the first error is that the client manages to disconnect before the websocket handshake is finialised? If so, I'd like to be able to handle that error by simply drop it! But I do not understand where/how to handle this error more gracefully, since it is so deep down in the websocket server code used by FastAPI. We have tried adding exception handling for ConnectionClosedError, but that one is never called. We have however been successful in catching the second error, the TypeError by wrapping the receiver_json() method on the websocket object.
Looking into the websockets/legacy/server.py code of the websockets lib only shows me that this error message is a final "catch all" and I cannot see any other info helping me understand why I see these.
I have 2 questions:
- Is my guess correct that this error can occur when a client disconnects before the handshake is finished?
- How can I catch this kind of error and simply drop them? If this happens due to the fact that a client disconnects before the handshake is done, there is nothing I can do so I do not care about them, but I do not want it to spam my logs.
Operating System
Linux
Operating System Details
Kubernetes 1.22 Containerd (cos_containerd) Google Kubernetes Engine
FastAPI Version
0.78.0
Python Version
Python 3.10.7
Additional Context
No response
The error can be caused by many things, but typically is because network connection was lost. Since you are running on K8S, this might be because a pod is rescheduled to another node, or because a client loses network connectivity.
To catch such an occurrence and handle it properly, I would imagine to encapsulate the whole await websocket.accept() and subsequent code into a try-except block. Might be overkill, but it is not easy to say without seeing the full stack trace and relevant code bits.
@Jacobh2 could you provide your example code to reproduce the exception? cuz when i use websocket in local env, didnt have his problem.
Hey! So I've tried to wrap the await websocket.accept() in a try-catch block now, but it doesn't help unfortunately. The two stacktraces are complete and is the only thing that I can see in the logs. To me it looks like the error happens before it even reaches our code, somewhere in the websockets lib in their websockets/legacy/server.py. How/when is that started/called from fastapi?
I'm working on a minimal setup that I can share with you, but having troubles reproducing the kind of traffic that we have in production. My thinking is that the way our client connects/disconnects are very random and abruptly, which is fine - I just don't want to crash and error-log every time it happens, I simply want to say "OK, a disconnect is fine". But for that I need to understand from where the error is called.
What also confuses me is that adding a fastapi exception-handler for the websockets.exceptions.ConnectionClosedError error doesn't help! I would expect that handler to be called, but it doesn't.
Thanks for the help @JarroVGIT !
@Jacobh2 please add a self-contained, minimal, reproducible, example that I can copy-paste to replicate it.
I have also seen this exception on our production servers. My current understanding of the issue is that the handshake fails (probably due to a connection abort, which also explains why it's so difficult to reproduce locally), but for some reason, the exception is not propagated and thus the receive endpoint triggers the TypeError in asyncio.wait.
This is the closest I have gotten to a reproducer:
from websockets.connection import State
from websockets.legacy.server import WebSocketServerProtocol
orig_handshake = WebSocketServerProtocol.handshake
async def hook_handshake(self, *args, **kwargs):
self.state = State.CLOSED # !!! <- Simulate connection issues
return await orig_handshake(self, *args, **kwargs)
WebSocketServerProtocol.handshake = hook_handshake
from fastapi import FastAPI, WebSocket
from fastapi.responses import HTMLResponse
app = FastAPI()
html = """
<!DOCTYPE html>
<html>
<head>
<title>WebSocket Bug?</title>
</head>
<body>
<script>
var ws = new WebSocket("ws://localhost:8000/ws");
</script>
</body>
</html>
"""
@app.get("/")
async def get():
return HTMLResponse(html)
@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
await websocket.accept()
await websocket.receive_text()