hunspell icon indicating copy to clipboard operation
hunspell copied to clipboard

Non-word characters flagged as typos

Open ErikAnderson3 opened this issue 4 years ago • 3 comments

This concerns the older Hunspell version 1.2.7, as bundled in the MemoQ translation memory software package.

Using the large EN-US dictionary and affix files from the SCOWL project: http://wordlist.aspell.net/dicts/

Hunspell is confusingly flagging non-word characters as typos, when these characters are used singly, with whitespace on either side. Testing has identified the following as problematic:

  (double-byte space, U+3000) ↑ (upward arrow, U+2191) ← (leftward arrow, U+2190) @ ("at" mark, U+0040) @ (double-byte "at" mark, U+FF20) ^ (caret, U+005E) _ (underscore, U+005F) | (vertical bar, U+007C) \ (backslash, U+005C) ` (backtick, U+0060)

The double-byte characters are not terribly surprising, but other double-byte characters that are not used in English (such as Japanese kana) are not flagged as typos, which is strange.

It is also strange that ↓ (downward arrow, U+2193) and → (rightward arrow, U+2192) are not flagged, while ↑ (upward arrow, U+2191) and ← (leftward arrow, U+2190) are flagged.

I have tried various settings in the affix file to try to get Hunspell to ignore these characters, such as "ICONV ↑ ↓" to convert these problematic characters to known-good ones, but no dice.

  • Is this a known issue with Hunspell v 1.2.7? ** If so, is there any known workaround? I am not as knowledgeable as I'd like about all the various options possible in the affix files.
  • Is this perhaps some artifact of how Hunspell v1.2.7 is integrated into memoQ, and should I be reporting to them instead?

Any advice appreciated!

ErikAnderson3 avatar Jul 13 '21 20:07 ErikAnderson3

Bump.

ErikAnderson3 avatar May 19 '22 16:05 ErikAnderson3