cargo-spellcheck
cargo-spellcheck copied to clipboard
Word "C++" is tokenized incorrectly and can not be whitelisted
Describe the bug
It is not possible to whitelist the word "C++" by adding it to the local Hunspell dictionary.
Adding "^[cC][+][+]$" to the transform_regex
list also does not help.
To Reproduce
Steps to reproduce the behaviour:
- A file containing the word "C++"
- Add "C++" into the local Hunspell dictionary.
- Run
cargo spellcheck ...
. - A spelling error message is displayed for every "+" in "C++".
Expected behavior
Hunspell finds "C++" in the local dictionary and accepts it as correct.
Screenshots
error: spellcheck(Hunspell)
--> /home/x/y.md:252
|
252 | Specifically, the GNU C++ compiler version 8.2 or newer and
| ^
| Possible spelling mistake found.
error: spellcheck(Hunspell)
--> /home/x/y.md:252
|
252 | Specifically, the GNU C++ compiler version 8.2 or newer and
| ^
| Possible spelling mistake found.
Please complete the following information:
- System: Arch Linux
- Obtained: pacman
- Version: cargo-spellcheck 0.11.2
Oh, I've accidentally found a workaround while figuring out how to make cargo-spellcheck
not complain about "—" (EM-DASH).
Adding
transform_regex = [..., "^[+]$"]
to the config makes cargo-spellcheck
accept "C++" as a correct word.
A workaround is to .. yes, exactly this - allow +
tokens. Tokenization is done by a third party lib and will never be perfect. Either use ``` or add the workaround you found.
If you would like to make spellcheck aware of additional splitchars, there is tokenization_splitchars
in [Hunspell]
.
If you would like to make spellcheck aware of additional splitchars, there is
tokenization_splitchars
in[Hunspell]
.
Thanks, that's even better!
BTW, it's not mentioned in
https://github.com/drahnr/cargo-spellcheck/blob/master/docs/configuration.md
and I had to run cargo spellcheck config --stdout
to find out about this parameter.