parallel-corpora-tools icon indicating copy to clipboard operation
parallel-corpora-tools copied to clipboard

Remove character-level tokenized words

Open M4t1ss opened this issue 5 years ago • 0 comments

Remove sentences where the number of non-space characters is equal (or very close?) to the number of tokens.

English

( c o n t i n u a t i o n )

Slovenian

( n a d a l j e v a n j e )

M4t1ss avatar Jan 09 '19 10:01 M4t1ss