sacremoses icon indicating copy to clipboard operation
sacremoses copied to clipboard

Tokenizer -x option is confusing

Open ZJaume opened this issue 4 years ago • 3 comments

The -x option says on the usage:

-x, --xml-escape               Escape special characters for XML.

And it does the same as -no-escape option in Moses.

ZJaume avatar May 05 '20 15:05 ZJaume

Hmmm true. But the point is to keep the interface pythonic, but I agree it's confusing. Let me think of a better wording for the feature =)

alvations avatar Jun 04 '20 00:06 alvations

What about something like

-x, --no-xml-escape      Don't perform escaping special characters for XML.

or just removing the shortened form -x and leave the --no-xml-escape? If --no-xml-escape is too long why not simply --no-escape like Moses?

I think it should at least have the "negation" on the help message because it is very confusing.

ZJaume avatar Jun 30 '20 13:06 ZJaume

Agreed, the option name and help text definitely do not make sense.

But then, does the default behaviour need to be that special XML characters are escaped (legacy behaviour from SMT/Moses)? I totally understand if the argument is that sacremoses should behave exactly like the original Moses tokenizer.

bricksdont avatar Jul 01 '20 19:07 bricksdont