vosk-server
vosk-server copied to clipboard
There is some mistakes in align_lexicon.txt for french model 0.22
Hello, we make some test of vosk asr platform in french. The chosen model is vosk-model-fr-0.22, due to the license Apache2. The g2p used to generate align_lexicon.txt have somme bug. One of them is related to acronym treatment, i explain. In french the normal way to prononce c.... is "K" except for acronyme where it is "S E". For the acronym, we spell the letter.
To continue our tests we need to fine tune the LM wit the correct lexicon. Is it possible to have the arpa file to test our corrections ?
In return, of course, we will provide you the new lexicon.
Best regards Seb
Is it possible to have the arpa file to test our corrections ?
Sure, you have to write email to [email protected] and describe your project to get link to French model update package
Hello,
We will start the check/ correction of model. To do it easier the chosen is way is to use the french dataset of lingua-libre. lingua libre is a very complet annotated dataset. We will generate a 1 gram graph (aka model without graph) to check only Acoustic Model and Lexicon. Pass the audio files to Vosk and check words where transcription is different from the annotation. With this result we can check words with bad phonemes, and possibly check phonemes where AM have difficulties. We will do it into notebook. If you are interested by it, we can share the notebook.
Best regard Sébastien
Sure, that would be nice.