recasepunc icon indicating copy to clipboard operation
recasepunc copied to clipboard

Model for recasing and repunctuating ASR transcripts

Results 11 recasepunc issues
Sort by recently updated
recently updated
newest added

In order to train a model on Russian dara from Web Crawl, do you suggest a specifc pre-trained bert model?

Hi, In French, we have dash is some situtation. recasepunc lost them. Here is a reproduction of the bug: ``` console $ cat input.txt salut toto comment vas-tu y a-t-il...

Doc, say: All models are trained from the 1st 100M tokens Can share some example how prepare that 100M tokens from the text input?, I'm trying to train support for...

I tried to use the french models (both `fr.22000` and `fr-txt.large.19000`) on a very simple text: > j'aime les fleurs les olives et la raclette When running `python3 recasepunc.py predict...

When I use Russian model, it gives me this error: ``` WARNING: reverting to cpu as cuda is not available Some weights of the model checkpoint at DeepPavlov/rubert-base-cased were not...

I am trying to use pretrained German model: https://alphacephei.com/vosk/models/vosk-recasepunc-de-0.21.zip and as mentioned in readme file, I run: python example.py de-test.txt but I keep getting following error: AttributeError: Can't get attribute...

Hi, thank you for this repo! I'm trying to reproduce results for different language, so I'm using multilingual-bert fine-tuned to my language dataset. Everything goes well during preprocessing and training,...

look at parameters below. They really became bool, i find this bug while debugging it. ''' if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument("action", help="train|eval|predict|tensorize|preprocess", type=str) ... parser.add_argument("--updates", help="number of...

Hello, Thanks for the work done here. I tried to punctuate a text written in French, but the output result wasn't too accurate. How can I improve the results? Thanks.

Hello, I have tested French model and in general it works great. One issue for me is on tokenization step. The words with ' are split on 2, so l'empire...