STT icon indicating copy to clipboard operation
STT copied to clipboard

Feature request: Multiple Parallel/Concatenatable Models

Open mo-g opened this issue 1 year ago • 0 comments

Honestly, this software is a black box to me, so this may be an inherently unachievable concept, but I didn't see an open or closed issue of something similar so thought it was worth asking.

Is your feature request related to a problem? Please describe. My project will need to understand given and last names from a very wide linguistic base - something that would be unachievably large to retrain every time a new one needs added.

Describe the solution you'd like I'd like to be able to train multiple separate models, e.g. "generic terms", "team names", "english names", "polish names", "spanish names", "mandarin names" and be able to load these models in parallel so that the STT can inference complete phrases containing words from multiple sources. It allows chunking of training, as well as selectively loading models to potentially reduce computational load by only inferencing with the scope necessary for where the software is installed.

Describe alternatives you've considered Loading parallel models and running the inference several times at once I guess would be the alternative? But I don't know how I would select which words to use from each output. I'm not massively experienced with this software yet. It would also presumably be less efficient than a single instance with multiple language models as the acoustic model and "boilerplate" would also then be duplicated.

Additional context This is for an open source system to allow voice-controlled calling between users, by making statements such as "call [recipient]" or "[initiator] to [recipient]". I can see something similar was planned for the original DeepSpeech but never made it into code before the project ended.

mo-g avatar Oct 31 '22 14:10 mo-g