autogen icon indicating copy to clipboard operation
autogen copied to clipboard

Heavy use of broad Exceptions leads to incorrect error handling

Open rhoef opened this issue 9 months ago • 3 comments

What happened?

Describe the bug I was working through the agentchat_fastapi examples—specifically the WebSocket example (app_team.py). While the example works out of the box, I tried connecting using a custom WebSocket client. Submitting and receiving messages works just fine.

However, when I close the socket on the client side, the server prints an error message and continues to listen.

    raise RuntimeError(f"Failed to get user input: {str(e)}") from e
RuntimeError: Failed to get user input: Failed to get user input: (1000, '')
Unexpected error: Unexpected ASGI message 'websocket.send', after sending 'websocket.close' or response already completed

Tracking down the issue was fairly straightforward. The server tries one last time to send a message because it's still waiting for user input (the app_team.py wraps websocket.receive_json inside a UserProxy agent, so it's effectively waiting for input).

What happens next is also simple:

  • The WebSocket raises a WebSocketDisconnect exception, as it should.
  • This exception is caught by a broad Exception, and then a RuntimeError is raised instead:
    raise RuntimeError("Failed to get user input: )

Trying to address this by subclassing the UserProxy agent revealed that this pattern—i.e., catching broad exceptions—is used quite frequently. This obscures the root cause of errors.

In this particular example (WebSocketDisconnect), it prevents the chat-endpoint in the agent_fastapi examples from returning gracefully after the client disconnects. (Note: there's a workaround using nested try/except blocks in the examples. However, the inner loop also uses a broad except that fails when trying to send on a closed socket.)

I've found this pattern of obscuring exceptions throughout the codebase, not just here.

To Reproduce Run app_team.py from the agentchat_fastapi example and run the following script. You can also kill the script instead of closing the socket. Ensure that the Server waits for input after at least one message was sent.

import asyncio
import json

import websockets


async def wsclient(uri="ws://localhost:8002/ws/chat"):
    """
    1. send message to server
    2. receive messages from server
    3. server is waiting for input (UserProxyAgent)
    3. close websocket on client side
    """
    async with websockets.connect(uri) as websocket:
        await websocket.send(json.dumps({"source": "user", "content": "Hello World"}))
        async for response in websocket:

            response = json.loads(response)
            print(f"{response['source']}", response["type"])

            if response["type"] == "UserInputRequestedEvent":

                # causes incorrect errorhandling on the server side if
                # close the socket while the server is waiting for input
                await websocket.close()


if __name__ == "__main__":
    asyncio.run(wsclient())

Expected behavior Avoid catching of broad/blind Exceptions see:

  • https://pylint.readthedocs.io/en/latest/user_guide/messages/warning/broad-exception-caught.html
  • https://docs.astral.sh/ruff/rules/blind-except/

The WebSocketDisconnect exception should be propagated so that the chat_endpoint can exit gracefully. No reraising other Exception classes, just raise or raise the same Exception Class e.g. raise WebSockentDisconnect('cusom message' from e

Which packages was the bug in?

Python AgentChat (autogen-agentchat>=0.4.0)

AutoGen library version.

Python 0.5.1

rhoef avatar Apr 11 '25 12:04 rhoef

@jackgerrits ,

Any general thoughts here?

victordibia avatar Apr 11 '25 15:04 victordibia

https://gist.github.com/rhoef/2ac5f28fde18d81d3e7e60a523254a68#file-app_team-py-L18

In this Gist I have subclassed the UserProxyAgent to solve the problem at least for the WebsocketDisconnet. For the purpose of the gist, it also contains the customize class.

The changes are:

  • Disable the ROOT_LOGGER, since it would still print the traceback (perhaps it should not log an error)
  • CustomUserProxyAgent, remove broad Exceptions in function get_input and on_messages_stream
  • Simply the loop: remove nested exceptions, add two break conditions, one with and one without sending error to the client.

rhoef avatar Apr 11 '25 19:04 rhoef

I agree with your observation. A lot of broad brushes have been used so far. We should address this.

ekzhu avatar Apr 15 '25 05:04 ekzhu

Yeah agreed, we are not specific enough for exceptions making them unactionable or impossible to correctly catch

jackgerrits avatar May 07 '25 17:05 jackgerrits