firefox-translations-training
firefox-translations-training copied to clipboard
Investigate monolingual cleaning
Based on @marco-c's feedback we should investigate how the HPLT project cleans monolingual data and whether we should adjust our cleaning procedure.
https://hplt-project.org/HPLT_D3_1___Software_for_cleaning_data_sets.pdf
Worth looking at: https://hplt-project.org/HPLT_D3_1___Software_for_cleaning_data_sets.pdf.
See also #247.