recasepunc issues

Training on Russian

5

In order to train a model on Russian dara from Web Crawl, do you suggest a specifc pre-trained bert model?

Dashes are removed from output

1

Hi, In French, we have dash is some situtation. recasepunc lost them. Here is a reproduction of the bug: ``` console $ cat input.txt salut toto comment vas-tu y a-t-il...

bersace

what way is used for Tokenize the dataset?

1

Doc, say: All models are trained from the 1st 100M tokens Can share some example how prepare that 100M tokens from the text input?, I'm trying to train support for...

cdgraff

RuntimeError when predicting with the french models

3

I tried to use the french models (both `fr.22000` and `fr-txt.large.19000`) on a very simple text: > j'aime les fleurs les olives et la raclette When running `python3 recasepunc.py predict...

maelchiotti

Russian model doesn't work, while English does

1

When I use Russian model, it gives me this error: ``` WARNING: reverting to cpu as cuda is not available Some weights of the model checkpoint at DeepPavlov/rubert-base-cased were not...

xenia19

While running pretrained German model: AttributeError: Can't get attribute 'Trie' on <module 'transformers.tokenization_utils'

2

I am trying to use pretrained German model: https://alphacephei.com/vosk/models/vosk-recasepunc-de-0.21.zip and as mentioned in readme file, I run: python example.py de-test.txt but I keep getting following error: AttributeError: Can't get attribute...

alihashaam

Cannot use trained model for validation or prediction

5

Hi, thank you for this repo! I'm trying to reproduce results for different language, so I'm using multilingual-bert fine-tuned to my language dataset. Everything goes well during preprocessing and training,...

khusainovaidar

parameters like --dab_rate can't be set from cmd line bc they are bool

look at parameters below. They really became bool, i find this bug while debugging it. ''' if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument("action", help="train|eval|predict|tensorize|preprocess", type=str) ... parser.add_argument("--updates", help="number of...

al-zatv

Low puncatuation accuracy for French

1

Hello, Thanks for the work done here. I tried to punctuate a text written in French, but the output result wasn't too accurate. How can I improve the results? Thanks.

BMouhcine

Words with ' are split on tokenization step

2

Hello, I have tested French model and in general it works great. One issue for me is on tokenization step. The words with ' are split on 2, so l'empire...

marlon-br

recasepunc
recasepunc copied to clipboard

Metadata

Training on Russian

Dashes are removed from output

what way is used for Tokenize the dataset?

RuntimeError when predicting with the french models

Russian model doesn't work, while English does

While running pretrained German model: AttributeError: Can't get attribute 'Trie' on <module 'transformers.tokenization_utils'

Cannot use trained model for validation or prediction

parameters like --dab_rate can't be set from cmd line bc they are bool

Low puncatuation accuracy for French

Words with ' are split on tokenization step

← Metadata

Owner

Metadata

recasepunc recasepunc copied to clipboard

Metadata

← Metadata

Owner

Metadata

recasepunc
recasepunc copied to clipboard