STT-models
STT-models copied to clipboard
Verify alphabet in pb* and tflite models
The alphabet files from Jaco models are inconsistent with the output of the models at runtime. It has been observed that the Jaco Spanish model can produce accented vowels, but the alphabet file does not include them. The alphabet file should be confirmed and uploaded to the zoo for language model generation.
TFModelState::init
and TFLiteModelState::init
can be modified to print out the loaded alphabet used to train the model here: https://github.com/coqui-ai/STT/blob/653ce25a7ce5bd6cbb564416d847d8afcd5c5e8c/native_client/tfmodelstate.cc#L120
Maybe the above could be the cause of the problem we are seeing on a dockerized ARM
environment when using the Jaco models for Spanish with the python (3.9
) bindings.
https://github.com/coqui-ai/STT/issues/2284
The correct alphabet files seem to be the following: https://gitlab.com/Jaco-Assistant/Scribosermo/-/tree/deepspeech/data
As you said, the Spanish alphabet includes accented vowels. Also other language's alphabets like French and Polish.