Non-word characters flagged as typos
This concerns the older Hunspell version 1.2.7, as bundled in the MemoQ translation memory software package.
Using the large EN-US dictionary and affix files from the SCOWL project: http://wordlist.aspell.net/dicts/
Hunspell is confusingly flagging non-word characters as typos, when these characters are used singly, with whitespace on either side. Testing has identified the following as problematic:
(double-byte space, U+3000) ↑ (upward arrow, U+2191) ← (leftward arrow, U+2190) @ ("at" mark, U+0040) @ (double-byte "at" mark, U+FF20) ^ (caret, U+005E) _ (underscore, U+005F) | (vertical bar, U+007C) \ (backslash, U+005C) ` (backtick, U+0060)
The double-byte characters are not terribly surprising, but other double-byte characters that are not used in English (such as Japanese kana) are not flagged as typos, which is strange.
It is also strange that ↓ (downward arrow, U+2193) and → (rightward arrow, U+2192) are not flagged, while ↑ (upward arrow, U+2191) and ← (leftward arrow, U+2190) are flagged.
I have tried various settings in the affix file to try to get Hunspell to ignore these characters, such as "ICONV ↑ ↓" to convert these problematic characters to known-good ones, but no dice.
- Is this a known issue with Hunspell v 1.2.7? ** If so, is there any known workaround? I am not as knowledgeable as I'd like about all the various options possible in the affix files.
- Is this perhaps some artifact of how Hunspell v1.2.7 is integrated into memoQ, and should I be reporting to them instead?
Any advice appreciated!
Bump.