cdec
cdec copied to clipboard
Tokenizer + tags
Tokenizing bitext with <p>
tags fails:
echo "x ||| <p>x</p>" | ~/tools/cdec/corpus/tokenize-anything.sh
x |||<p> x</p>
Why these tags are in my corpus is another problem :tired_face: