fastapi icon indicating copy to clipboard operation
fastapi copied to clipboard

"opening handshake failed" for websocket endpoint

Open Jacobh2 opened this issue 3 years ago • 2 comments

First Check

  • [X] I added a very descriptive title to this issue.
  • [X] I used the GitHub search to find a similar issue and didn't find it.
  • [X] I searched the FastAPI documentation, with the integrated search.
  • [X] I already searched in Google "How to X in FastAPI" and didn't find any information.
  • [X] I already read and followed all the tutorial in the docs and didn't find an answer.
  • [X] I already checked if it is not related to FastAPI but to Pydantic.
  • [X] I already checked if it is not related to FastAPI but to Swagger UI.
  • [X] I already checked if it is not related to FastAPI but to ReDoc.

Commit to Help

  • [X] I commit to help with one of those options 👆

Example Code

-

Description

Hey!

We are using FastAPI to setup a websocket endpoint and running uvicorn workers.

Very often we see errors saying opening handshake failed together with this stacktrace:

Traceback (most recent call last):
 File \"/usr/local/lib/python3.10/site-packages/websockets/legacy/server.py\", line 163, in handler
 await self.handshake(
 File \"/usr/local/lib/python3.10/site-packages/websockets/legacy/server.py\", line 597, in handshake
 raise self.connection_closed_exc() # pragma: no cover
websockets.exceptions.ConnectionClosedError: no close frame received or sent

Right after we see this other error, saying that the await websocket.receive_json() isn't an awaitable thing, which of course it is.

Traceback (most recent call last):
 File \"/events/websocket.py\", line 121, in receive_events
 content: Dict[str, Any] = await websocket.receive_json()
 File \"/usr/local/lib/python3.10/site-packages/starlette/websockets.py\", line 132, in receive_json
 message = await self.receive()
 File \"/usr/local/lib/python3.10/site-packages/starlette/websockets.py\", line 45, in receive
 message = await self._receive()
 File \"/usr/local/lib/python3.10/site-packages/uvicorn/protocols/websockets/websockets_impl.py\", line 336, in asgi_receive
 data = await self.recv()
 File \"/usr/local/lib/python3.10/site-packages/websockets/legacy/protocol.py\", line 536, in recv
 await asyncio.wait(
 File \"/usr/local/lib/python3.10/asyncio/tasks.py\", line 382, in wait
 fs = {ensure_future(f, loop=loop) for f in fs}
 File \"/usr/local/lib/python3.10/asyncio/tasks.py\", line 382, in <setcomp>
 fs = {ensure_future(f, loop=loop) for f in fs}
 File \"/usr/local/lib/python3.10/asyncio/tasks.py\", line 615, in ensure_future
 return _ensure_future(coro_or_future, loop=loop)
 File \"/usr/local/lib/python3.10/asyncio/tasks.py\", line 630, in _ensure_future
 raise TypeError('An asyncio.Future, a coroutine or an awaitable '
TypeError: An asyncio.Future, a coroutine or an awaitable is required

My understanding for the first error is that the client manages to disconnect before the websocket handshake is finialised? If so, I'd like to be able to handle that error by simply drop it! But I do not understand where/how to handle this error more gracefully, since it is so deep down in the websocket server code used by FastAPI. We have tried adding exception handling for ConnectionClosedError, but that one is never called. We have however been successful in catching the second error, the TypeError by wrapping the receiver_json() method on the websocket object.

Looking into the websockets/legacy/server.py code of the websockets lib only shows me that this error message is a final "catch all" and I cannot see any other info helping me understand why I see these.

I have 2 questions:

  1. Is my guess correct that this error can occur when a client disconnects before the handshake is finished?
  2. How can I catch this kind of error and simply drop them? If this happens due to the fact that a client disconnects before the handshake is done, there is nothing I can do so I do not care about them, but I do not want it to spam my logs.

Operating System

Linux

Operating System Details

Kubernetes 1.22 Containerd (cos_containerd) Google Kubernetes Engine

FastAPI Version

0.78.0

Python Version

Python 3.10.7

Additional Context

No response

Jacobh2 avatar Sep 19 '22 13:09 Jacobh2

The error can be caused by many things, but typically is because network connection was lost. Since you are running on K8S, this might be because a pod is rescheduled to another node, or because a client loses network connectivity.

To catch such an occurrence and handle it properly, I would imagine to encapsulate the whole await websocket.accept() and subsequent code into a try-except block. Might be overkill, but it is not easy to say without seeing the full stack trace and relevant code bits.

JarroVGIT avatar Sep 19 '22 18:09 JarroVGIT

@Jacobh2 could you provide your example code to reproduce the exception? cuz when i use websocket in local env, didnt have his problem.

csrgxtu avatar Sep 20 '22 07:09 csrgxtu

Hey! So I've tried to wrap the await websocket.accept() in a try-catch block now, but it doesn't help unfortunately. The two stacktraces are complete and is the only thing that I can see in the logs. To me it looks like the error happens before it even reaches our code, somewhere in the websockets lib in their websockets/legacy/server.py. How/when is that started/called from fastapi?

I'm working on a minimal setup that I can share with you, but having troubles reproducing the kind of traffic that we have in production. My thinking is that the way our client connects/disconnects are very random and abruptly, which is fine - I just don't want to crash and error-log every time it happens, I simply want to say "OK, a disconnect is fine". But for that I need to understand from where the error is called.

What also confuses me is that adding a fastapi exception-handler for the websockets.exceptions.ConnectionClosedError error doesn't help! I would expect that handler to be called, but it doesn't.

Jacobh2 avatar Sep 22 '22 07:09 Jacobh2

Thanks for the help @JarroVGIT !

@Jacobh2 please add a self-contained, minimal, reproducible, example that I can copy-paste to replicate it.

tiangolo avatar Nov 20 '22 12:11 tiangolo

I have also seen this exception on our production servers. My current understanding of the issue is that the handshake fails (probably due to a connection abort, which also explains why it's so difficult to reproduce locally), but for some reason, the exception is not propagated and thus the receive endpoint triggers the TypeError in asyncio.wait.

This is the closest I have gotten to a reproducer:

from websockets.connection import State
from websockets.legacy.server import WebSocketServerProtocol

orig_handshake = WebSocketServerProtocol.handshake

async def hook_handshake(self, *args, **kwargs):
    self.state = State.CLOSED  # !!! <- Simulate connection issues
    return await orig_handshake(self, *args, **kwargs)


WebSocketServerProtocol.handshake = hook_handshake

from fastapi import FastAPI, WebSocket
from fastapi.responses import HTMLResponse

app = FastAPI()

html = """
<!DOCTYPE html>
<html>
    <head>
        <title>WebSocket Bug?</title>
    </head>
    <body>
        <script>
            var ws = new WebSocket("ws://localhost:8000/ws");
        </script>
    </body>
</html>
"""


@app.get("/")
async def get():
    return HTMLResponse(html)


@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()
    await websocket.receive_text()

slackner avatar Nov 20 '22 18:11 slackner