diart icon indicating copy to clipboard operation
diart copied to clipboard

feat: Add WebSocket server with multi-client support

Open janaab11 opened this issue 11 months ago • 11 comments

Overview

Implements a WebSocket server that can handle audio streams from multiple client connections

Changes

  • Added multi-client support to WebSocket server
  • Created StreamingInferenceHandler for managing connections
  • Added Dockerfile for easier deployment

Testing

  • Tested with multiple concurrent clients
  • Verified Docker container functionality
  • Checked resource cleanup on disconnection

Please let me know if any changes or improvements are needed!


Fixes #252

janaab11 avatar Dec 22 '24 17:12 janaab11

Couple things I still want to work on:

  1. The way resources are shared between client connections. Currently, each connection shares the same config but the models (for segmentation and embedding, in SpeakerDiarization) are initialised and maintained separately. This is a more flexible design, but quite wasteful.

    • I have attempted a fix at this, but parallel connections ended up sharing state. From what I understood, I might need to dig deeper into the Aggregation steps in the pipeline.
  2. Related to the above point. I was also wondering if different configs could share the same underlying model resources at runtime. For eg. I have seen performance differ a lot for the same config when number of speakers are far apart - say, 2 vs 10. And this is a parameter that a client is better suited to configure, when setting up the connection to the server.

janaab11 avatar Dec 22 '24 17:12 janaab11

I have also added a cleanup step in the server, when a client disconnects. This was mostly to ensure explicit memory management since client streams are not sharing resources, at the moment - but this should also address #255

janaab11 avatar Dec 22 '24 18:12 janaab11

Moved to LazyModel for resource management, based off this comment. Client-specific Pipeline instances now share resources that are initialised in a common PipelineConfig instance.

Still unsure about how this would scale with client connections - would appreciate any thoughts on this!

janaab11 avatar Dec 31 '24 15:12 janaab11

@janaab11 about your question concerning the wasteful model copies, I fully agree with this limitation. However, I think it would be best suited for a separate PR, it's a pretty big amount of work and I would hate to delay the multi-client feature because of it. Glad to discuss it in a future PR if that interests you!

juanmc2005 avatar Jan 02 '25 16:01 juanmc2005

@janaab11 about your question concerning the wasteful model copies, I fully agree with this limitation. However, I think it would be best suited for a separate PR, it's a pretty big amount of work and I would hate to delay the multi-client feature because of it. Glad to discuss it in a future PR if that interests you!

Definitely interested in resolving this - happy to take it up after closing the work here!

janaab11 avatar Jan 03 '25 00:01 janaab11

WebSocketAudioSource: I do agree it is acting as a proxy, but prefer keeping it as an AudioSource subclass - makes it consistent with other audio sources. Would like to think more about this and then propose changes.

Oh it would definitely still be a subclass of AudioSource, my suggestion was simply to move it to websockets.py so we hide it from the end user. I don't think such a proxy audio source would be needed under normal circumstances

juanmc2005 avatar Jan 03 '25 10:01 juanmc2005

Oh it would definitely still be a subclass of AudioSource, my suggestion was simply to move it to websockets.py so we hide it from the end user. I don't think such a proxy audio source would be needed under normal circumstances

Okay this does make sense to me too - not exposing such websocket-specific functionality. Made the changes!

janaab11 avatar Jan 03 '25 15:01 janaab11

Added error handling for the following edge cases - in send, close and _on_message_received methods:

if client is None:
    return
if client_id not in self._clients:
    return

These edge-cases can occur due to race conditions in the client lifecycle (connect/disconnect/cleanup) or network issues that lead to client state mismatches between the server and client. Added warnings to catch these async timing issues, and documented the edge-case conditions in the respective method's docstring.

janaab11 avatar Jan 03 '25 21:01 janaab11

Modified client.py to handle disconnects properly on KeyboardInterrupt events - i believe this was referenced in another issue as well. Do let me know if the implementation is too involved? I had liked the simplicity of the client before this.

Apart from this, complete documentation in the README is pending. Will get to that next.

janaab11 avatar Jan 03 '25 23:01 janaab11

@janaab11 do you have any updates on this PR? How can we unblock it?

juanmc2005 avatar Feb 10 '25 11:02 juanmc2005

hi @juanmc2005 sorry for disappearing here. got caught up with some things at work and life. i plan to pick this up again this weekend - and hopefully close all open comments!

janaab11 avatar Mar 12 '25 16:03 janaab11