lttoolbox icon indicating copy to clipboard operation
lttoolbox copied to clipboard

Special tag for all unrecognized symbols

Open mansayk opened this issue 5 years ago • 1 comments

Hello!

There are so many non-whitespace symbols that are not recognized by Apertium's tagger and not marked in any way. For example, apertium-tat does not recognize the following symbols: _ @ % ~ | and thousands others.

Is it possible to use some special tag (^_/_<unknown> or <sym>$) for such cases?

Without tagging it is difficult to process Apertium's output. Streamparser also leaves such cases in "blank" variable or skips them.

mansayk avatar Nov 14 '18 15:11 mansayk

This would be a really useful feature that I know we have often talked about at conferences / summer schools. I would also prefer all (non-white-space?) tagged like this, at least having an option to get all codepoints unknown to dictionary tagged as something would be a good starting point.

flammie avatar Nov 15 '18 11:11 flammie