cg3
cg3 copied to clipboard
cg-mwesplit adds extra newline
Cf the following (using giellalt/lang-sme as example):
echo 'Jođiheaddji guovttosges' | hfst-tokenise -g tokeniser-gramcheck-gt-desc.pmhfst
"<Jođiheaddji guovttosges>"
"ges" Pcle Foc/ges <W:0.0> "<ges>"
"jođiheaddji guovttos" N Coll Sem/Group_Hum Sg Loc <W:0.0> "<Jođiheaddji guovttos>"
"ges" Pcle Foc/ges <W:0.0> "<ges>"
"jođiheaddji guovttos" N Coll Sem/Group_Hum Sg Nom <W:0.0> "<Jođiheaddji guovttos>"
:\n
'Jođiheaddji guovttosges' | hfst-tokenise -g tokeniser-gramcheck-gt-desc.pmhfst | cg-mwesplit
"<Jođiheaddji guovttos>"
"jođiheaddji guovttos" N Coll Sem/Group_Hum Sg Loc <W:0.0>
"jođiheaddji guovttos" N Coll Sem/Group_Hum Sg Nom <W:0.0>
"<ges>"
"ges" Pcle Foc/ges <W:0.0>
:\n
After cg-mwesplit
has been applied, there is an extra newline after the split cohorts that was not there in the input. Do you get the same, @unhammer ?