alvations comments

Results 155 comments of


                                            alvations

trafficstars

Apostrophes in English

The normalization bug in sacremoses happens here: - https://github.com/alvations/sacremoses/blob/master/sacremoses/normalize.py#L41 and - https://github.com/alvations/sacremoses/blob/master/sacremoses/normalize.py#L43

Apostrophes in English

Thanks @j0hannes for catching this, #78 should fix it but it should be rechecked with the Moses decoder repo too.

Apostrophes in English

After the #78 fix, your cleaning workflow for your input would be something like: 1. First normalize your input 2. Then detokenize it (that's assuming you know that the original...

Apostrophes in English

Yes, the example I gave is one of the typical pipeline that people use to clean the data for machine translation. What's the expected output of in your example? Do...

Apostrophes in English

Ah, do you mean something like: ```python >>> from sacremoses import MosesDetokenizer >>> md = MosesDetokenizer(lang='en') >>> text = "yesterday 's reception" >>> md.detokenize(text.split()) "yesterday's reception" ``` But with the...

Apostrophes in English

Actually, this part on adding new apostrophe to the detokenization process isn't simple, https://github.com/alvations/sacremoses/blob/master/sacremoses/tokenize.py#L678 Because: - There's some smart quote counting happening - And the de-spacing of apostrophe might be...

alvations

Apostrophes in English

Apostrophes in English

Apostrophes in English

Apostrophes in English

Apostrophes in English

Apostrophes in English

Apostrophes in English

Restrict click to be <8.1

Possible to retrain/keep training an existing model?

Possible to retrain/keep training an existing model?