Broker drops websocket clients if they send events too quickly
Using the code here:
https://github.com/simeonmiteff/repro-zeek-broker-ws-disconnect
Running the go app:
$ go build && ./repro-zeek-broker-ws-disconnect
2024/09/18 10:39:37 connected to broker endpoint=056d3b6a-2161-5be6-8b68-30328ee79b91 version=2.8.0-dev
2024/09/18 10:39:37 failed with error=write tcp 127.0.0.1:36976->127.0.0.1:16666: write: connection reset by peer
Running zeek with broker debugging enabled:
$ BROKER_CONSOLE_VERBOSITY=debug zeek test.zeek
[broker/INFO] 2024-09-18T10:39:35.037 creating endpoint 056d3b6a-2161-5be6-8b68-30328ee79b91
<snip>
peer added, endpoint=[id=62014dce-47b7-5ac8-8606-a50ac39d61f6, network=[address=127.0.0.1, bound_port=36976/tcp]], msg=handshake successful
[broker/DEBUG] 2024-09-18T10:39:37.497 got 1 messages from bounded buffer
[broker/DEBUG] 2024-09-18T10:39:37.497 drained buffer, extinguish flare
[broker/DEBUG] 2024-09-18T10:39:37.497 polled 1 messages
<snip>
[broker/DEBUG] 2024-09-18T10:39:37.514 client 127.0.0.1:36976 disconnected
[broker/DEBUG] 2024-09-18T10:39:37.515 got 1 messages from bounded buffer
[broker/DEBUG] 2024-09-18T10:39:37.515 drained buffer, extinguish flare
[broker/DEBUG] 2024-09-18T10:39:37.515 polled 1 messages
[broker/DEBUG] 2024-09-18T10:39:37.515 got 4 messages from bounded buffer
[broker/DEBUG] 2024-09-18T10:39:37.515 polled 4 messages
[broker/DEBUG] 2024-09-18T10:39:37.515 got 33 messages from bounded buffer
[broker/DEBUG] 2024-09-18T10:39:37.515 polled 33 messages
[broker/DEBUG] 2024-09-18T10:39:37.515 got 46 messages from bounded buffer
[broker/DEBUG] 2024-09-18T10:39:37.515 drained buffer, extinguish flare
[broker/DEBUG] 2024-09-18T10:39:37.515 polled 46 messages
peer lost at cnt=467, endpoint=[id=62014dce-47b7-5ac8-8606-a50ac39d61f6, network=[address=127.0.0.1, bound_port=36976/tcp]], msg=lost connection to client
received termination signal
<snip>
Note that this doesn't always happen on the first try - you may need to run it a few times.
In general it looks like:
- If events are sent rapidly broker's buffer can grow up to a point where the client is dropped silently.
- Slowing down sending (by sleeping in the client TX loop, or sending larger events) prevents this from happening.
I can accept if disconnecting clients under these conditions is reasonable (I don't feel qualified to make that judgement) but in that case I would suggest some of all of the following:
- zeek logging an error;
- zeek calling an error event to allow scripts to react;
- broker sending back an error message to the websocket client before disconnecting it.
Thanks for reporting! I'm currently prioritizing https://github.com/zeek/broker/issues/426, but I'll pick this one up afterwards.
Did you have a chance to look at this, @Neverlord?
@simeonmiteff — Once Zeek 7.2 is released, would you mind repeating your test against the new Zeek-stack websocket?
@simeonmiteff — Once Zeek 7.2 is released, would you mind repeating your test against the new Zeek-stack websocket?
Yes! See https://github.com/zeek/zeek/issues/4420
Did you have a chance to look at this, @Neverlord?
My assumption was that the Broker WebSocket API is basically obsolete now with Arne's work on shifting the WebSocket interface to Zeek natively. I should've raised this question in one of our huddles, though.
@ckreibich, @MP-Corelight what's the status of the WebSocket bindings at this point? Do we plan on supporting both implementations (in Zeek and in Broker)? Or are we going to remove the Broker implementation with Zeek 8 (which was my assumption, again, sorry for not discussing this properly)?