parallel-corpora-tools icon indicating copy to clipboard operation
parallel-corpora-tools copied to clipboard

Tools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.

Results 1 parallel-corpora-tools issues
Sort by recently updated
recently updated
newest added

Remove sentences where the number of non-space characters is equal (or very close?) to the number of tokens. English > ( c o n t i n u a t...