cargo-spellcheck Word "C++" is tokenized incorrectly and can not be whitelisted

Word "C++" is tokenized incorrectly and can not be whitelisted

Open ravenexp opened this issue 2 years ago • 3 comments

Describe the bug

It is not possible to whitelist the word "C++" by adding it to the local Hunspell dictionary.

Adding "^[cC][+][+]$" to the transform_regex list also does not help.

To Reproduce

Steps to reproduce the behaviour:

A file containing the word "C++"
Add "C++" into the local Hunspell dictionary.
Run cargo spellcheck ....
A spelling error message is displayed for every "+" in "C++".

Expected behavior

Hunspell finds "C++" in the local dictionary and accepts it as correct.

Screenshots

error: spellcheck(Hunspell)
    --> /home/x/y.md:252
     |
 252 | Specifically, the GNU C++ compiler version 8.2 or newer and
     |                        ^
     |   Possible spelling mistake found.
error: spellcheck(Hunspell)
    --> /home/x/y.md:252
     |
 252 | Specifically, the GNU C++ compiler version 8.2 or newer and
     |                         ^
     |   Possible spelling mistake found.

Please complete the following information:

System: Arch Linux
Obtained: pacman
Version: cargo-spellcheck 0.11.2

Aug 17 '22 12:08 ravenexp

Oh, I've accidentally found a workaround while figuring out how to make cargo-spellcheck not complain about "—" (EM-DASH).

Adding

transform_regex = [..., "^[+]$"]

to the config makes cargo-spellcheck accept "C++" as a correct word.

Aug 17 '22 12:08 ravenexp

A workaround is to .. yes, exactly this - allow + tokens. Tokenization is done by a third party lib and will never be perfect. Either use ``` or add the workaround you found.

If you would like to make spellcheck aware of additional splitchars, there is tokenization_splitchars in [Hunspell].

Aug 17 '22 12:08 drahnr

If you would like to make spellcheck aware of additional splitchars, there is tokenization_splitchars in [Hunspell].

Thanks, that's even better!

BTW, it's not mentioned in

https://github.com/drahnr/cargo-spellcheck/blob/master/docs/configuration.md

and I had to run cargo spellcheck config --stdout to find out about this parameter.

Aug 17 '22 12:08 ravenexp

cargo-spellcheck cargo-spellcheck copied to clipboard

Word "C++" is tokenized incorrectly and can not be whitelisted

cargo-spellcheck
cargo-spellcheck copied to clipboard