Benoit Favre

Results 20 comments of Benoit Favre

Did you install the version of transformers from requirements? This kind of error is typical of a mismatch in Bert model class structure.

This typically happens when you have a different version of transformers installed compared to the one the model was trained with. I don't know about vosk, but the models provided...

You need to be more specific about your issue. What kind of dataset did you process? What accuracy are you expecting on that genre? What is the accuracy of typical...

Thanks for reporting those issues. For the first one, could you be more specific about the sequence of commands you have run and the data you used. Do you observe...

We miss a tokenizer that preserves offsets from the source text in order to insert punctuation without altering the text. Currently, a set of rules is applied for detokenization, and...

I don't know about Russian BERTs but what you want to care about is tokeization. In particular the preprocessing stage needs to normalize punctuation in a tokenization-neutral manner.

This is a byproduct of how tokenization is performed in the Flaubert model. I am afraid the only way to handle it is to retrain a model with a different...

The tokenizer is included in the recasepunc.py script. It is a modified version of the BERT tokenizer which keeps punctuation when lower-casing. Example usage: ``` xzcat cc-100/$lang.txt.xz | python recasepunc.py...

Another use case: disable on metered connections.

Did you have a look at requirements.freeze.txt?