recasepunc
recasepunc copied to clipboard
Model for recasing and repunctuating ASR transcripts
In order to train a model on Russian dara from Web Crawl, do you suggest a specifc pre-trained bert model?
Hi, In French, we have dash is some situtation. recasepunc lost them. Here is a reproduction of the bug: ``` console $ cat input.txt salut toto comment vas-tu y a-t-il...
Doc, say: All models are trained from the 1st 100M tokens Can share some example how prepare that 100M tokens from the text input?, I'm trying to train support for...
I tried to use the french models (both `fr.22000` and `fr-txt.large.19000`) on a very simple text: > j'aime les fleurs les olives et la raclette When running `python3 recasepunc.py predict...
When I use Russian model, it gives me this error: ``` WARNING: reverting to cpu as cuda is not available Some weights of the model checkpoint at DeepPavlov/rubert-base-cased were not...
I am trying to use pretrained German model: https://alphacephei.com/vosk/models/vosk-recasepunc-de-0.21.zip and as mentioned in readme file, I run: python example.py de-test.txt but I keep getting following error: AttributeError: Can't get attribute...
Hi, thank you for this repo! I'm trying to reproduce results for different language, so I'm using multilingual-bert fine-tuned to my language dataset. Everything goes well during preprocessing and training,...
look at parameters below. They really became bool, i find this bug while debugging it. ''' if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument("action", help="train|eval|predict|tensorize|preprocess", type=str) ... parser.add_argument("--updates", help="number of...
Hello, Thanks for the work done here. I tried to punctuate a text written in French, but the output result wasn't too accurate. How can I improve the results? Thanks.
Hello, I have tested French model and in general it works great. One issue for me is on tokenization step. The words with ' are split on 2, so l'empire...