diacritics_restoration icon indicating copy to clipboard operation
diacritics_restoration copied to clipboard

Empty README in data

Open stoianmihail opened this issue 5 years ago • 1 comments

If I want to generate the data for the romanian language, how could I do that? Thanks a lot!

stoianmihail avatar Dec 26 '19 19:12 stoianmihail

Hello @stoianmihail , have a look into https://github.com/arahusky/diacritics_restoration/tree/master/data/create_corpus_scripts which contains README. This folder stores scripts that can automatically download clean monolingual data.

In case you already have monolingual data, simply run https://github.com/arahusky/diacritics_restoration/blob/master/data/diacritization_stripping.py to remove diacritics from it.

arahusky avatar Jan 02 '20 17:01 arahusky