diart
diart copied to clipboard
feat: Add WebSocket server with multi-client support
Overview
Implements a WebSocket server that can handle audio streams from multiple client connections
Changes
- Added multi-client support to WebSocket server
- Created
StreamingInferenceHandlerfor managing connections - Added Dockerfile for easier deployment
Testing
- Tested with multiple concurrent clients
- Verified Docker container functionality
- Checked resource cleanup on disconnection
Please let me know if any changes or improvements are needed!
Fixes #252
Couple things I still want to work on:
-
The way resources are shared between client connections. Currently, each connection shares the same config but the models (for segmentation and embedding, in
SpeakerDiarization) are initialised and maintained separately. This is a more flexible design, but quite wasteful.- I have attempted a fix at this, but parallel connections ended up sharing state. From what I understood, I might need to dig deeper into the Aggregation steps in the pipeline.
-
Related to the above point. I was also wondering if different configs could share the same underlying model resources at runtime. For eg. I have seen performance differ a lot for the same config when number of speakers are far apart - say, 2 vs 10. And this is a parameter that a client is better suited to configure, when setting up the connection to the server.
I have also added a cleanup step in the server, when a client disconnects. This was mostly to ensure explicit memory management since client streams are not sharing resources, at the moment - but this should also address #255
Moved to LazyModel for resource management, based off this comment. Client-specific Pipeline instances now share resources that are initialised in a common PipelineConfig instance.
Still unsure about how this would scale with client connections - would appreciate any thoughts on this!
@janaab11 about your question concerning the wasteful model copies, I fully agree with this limitation. However, I think it would be best suited for a separate PR, it's a pretty big amount of work and I would hate to delay the multi-client feature because of it. Glad to discuss it in a future PR if that interests you!
@janaab11 about your question concerning the wasteful model copies, I fully agree with this limitation. However, I think it would be best suited for a separate PR, it's a pretty big amount of work and I would hate to delay the multi-client feature because of it. Glad to discuss it in a future PR if that interests you!
Definitely interested in resolving this - happy to take it up after closing the work here!
WebSocketAudioSource: I do agree it is acting as a proxy, but prefer keeping it as an AudioSource subclass - makes it consistent with other audio sources. Would like to think more about this and then propose changes.
Oh it would definitely still be a subclass of AudioSource, my suggestion was simply to move it to websockets.py so we hide it from the end user. I don't think such a proxy audio source would be needed under normal circumstances
Oh it would definitely still be a subclass of
AudioSource, my suggestion was simply to move it towebsockets.pyso we hide it from the end user. I don't think such a proxy audio source would be needed under normal circumstances
Okay this does make sense to me too - not exposing such websocket-specific functionality. Made the changes!
Added error handling for the following edge cases - in send, close and _on_message_received methods:
if client is None:
return
if client_id not in self._clients:
return
These edge-cases can occur due to race conditions in the client lifecycle (connect/disconnect/cleanup) or network issues that lead to client state mismatches between the server and client. Added warnings to catch these async timing issues, and documented the edge-case conditions in the respective method's docstring.
Modified client.py to handle disconnects properly on KeyboardInterrupt events - i believe this was referenced in another issue as well. Do let me know if the implementation is too involved? I had liked the simplicity of the client before this.
Apart from this, complete documentation in the README is pending. Will get to that next.
@janaab11 do you have any updates on this PR? How can we unblock it?
hi @juanmc2005 sorry for disappearing here. got caught up with some things at work and life. i plan to pick this up again this weekend - and hopefully close all open comments!