deepspeech-websocket-server icon indicating copy to clipboard operation
deepspeech-websocket-server copied to clipboard

allowing nested calls after recognition

Open ezavesky opened this issue 5 years ago • 0 comments

I would like to chain additional processing steps after the recognition has been completed. This allows the inclusion of other cool things to be executed on top of the speech alone: sentiment analysis, topic understanding, speaker detection, etc.

Here's a rough sketch of the concept...

  • Each may follow their own "server" file that launches a new service so that they don't complicate the existing single-server architecture
  • Each would communicate over web calls (REST) to avoid process confusion; in the future, we could expand it to be something more rigorous like a message queue.
  • Each can communicate via stored JSON/metadata or audio files written to disk
  • Each can "register" itself with the main speech server as a secondary process on start-up. For example, the "speaker detection" module will (a) launch it's own service, (b) register with primary server, (c) accept REST calls and reply with JSON / text as required
  • Just one example, but each module could leverage other OSS like uis-rnn or pyannote-audio (both taken from this great repo of examples)

Seeking opinions at this point with more details to be flushed out later. Of course, eventually we may convert this suite into a package (e.g. satisfying #2), but that's not paramount right now.

ezavesky avatar Sep 05 '19 12:09 ezavesky