vscode-spell-checker
vscode-spell-checker copied to clipboard
C/C++: Some misspelled words are not detected
Misspelled words which are not detected: avalible, handeled, evalulated, deciced, pressent, senting.
It detected those words for me.
I searched all the dictionaries, the words were not found. What programming language are you using?
Hi, thanks for looking into this. I'm using C++ (and the official C++ extension). I didn't do any special setup of the spell checker other than which file types are checked and adding some words to the workspace dictionary (definitely not these ones, I will check).
It is because most people while programming in c++ glue words together: errorhandler
, to account for that, the spell checker allows for compound words. Your examples include multiple valid words: dec°iced
, press°ent
, han°deled
I to do not think the way it currently works is ideal.
The plan is to change C/C++ compound matching to match against noun compounds instead of compounds made up of all words. errorcode
, resturncode
, htmlelement
, messagehandler
, errormessage
would all be the kinds of stuff it would think is correct. This would help with suggestions as well. Things like noun{1,3}
or (verb)(noun){,3}
.
I am running into this issue during code reviews and it causes quite a bit of grief. Specifically, words like evalute
and GetMsgSrollTime
(should be GetMsgScrollTime
) are not being detected.
Many developers use a naming convention to separate individual words in an identifier name (e.g. camelCase, PascalCase, and snake_case). It would be great if the extension could take advantage of this to check for misspelled words. Unfortunately, it is difficult to predict which naming convention a developer is using. Therefore, an option to control how the extension parses compound words could work well for this issue. The option would have a checklist of common compound word naming conventions (including compound words using all lowercase letters) that the extension would know to treat as compound words. If camelCase is enabled, evalUte
would be okay. If camelCase is disabled, evalUte
would be treated as one word, "evalute," and would be incorrect. If alllowercase is enabled, evalute
would be okay. If alllowercase is disabled, evalute
would be incorrect. I think you get the picture.
There's also the case where any any naming convention could be used (non-common ones). Considering something like eVaLUtE
, if any naming convention is enabled, the extension would behave as it does today (does not detect individual words based on case changes). The extension would interpret that word as eVaL + UtE
. I hope it's obvious that nobody would think that eVaLUtE
is correctly spelled, since humans follow patterns instead of chaos.
I see the same with Python. For example, "singal" is not detected, presumably because it's "sing" + "al".
I thought that the cSpell.allowCompoundWords
would control this behavior, but apparently not...
@kit1980 you are right, it is because allowCompoundWords
is turned on for Python and C/C++.
To turn off allowCompoundWords
for a language, you need to override it at the language level:
The following will turn off compound word matching for C/C++ and Python:
"cSpell.languageSettings": [
{
"languageId": "c,cpp,python",
"allowCompoundWords": false
}
]
@kit1980 you are right, it is because
allowCompoundWords
is turned on for Python and C/C++.To turn off
allowCompoundWords
for a language, you need to override it at the language level:The following will turn off compound word matching for C/C++ and Python:
"cSpell.languageSettings": [ { "languageId": "c,cpp,python", "allowCompoundWords": false } ]
Thanks for the fix! This should really be the default behavior IMHO (or at least the default behavior needs some case-matching refinement). Add me to the list of people who pushed code with typos because of this.
My plan is to turn allowCompoundWords
off by default.
To do that, I have been working on a way to define compoundable words.
It is a simple syntax:
error*
*code
+infix+
+msg
*
- optional compound
+
- required compound
With this definition valid words are:
error, code, errorcode, errormsg, errorinfixmsg
The follow are some of the not allowed words:
codemsg, msg
Is this the reason why servie
isn't correctly checked? In a plain text file:
https://user-images.githubusercontent.com/30010/156350957-74843052-b408-4481-9bae-6f75b3ae7aa9.mp4
Is this the reason why servie isn't correctly checked?
Yes it was. Sorry for the noice.
@PEZ,
You can use the cspell
trace
command to check.
npx cspell trace --language-id=cpp servie

Ah. sweet!
The setting for compound words tell us that it might make misspelled words look correct. It would be nice to also tell us the setting can be disabled per language. I was getting frustrated with all the undetected typos in Markdown, like insructions, but disabling compound words for Markdown helps a lot. I'd rather have false positives (flagged correct word) than false negatives (misspelled word not flagged). Is there also a way to disable in code comments, i.e. compound words would only be allowed in code, not in natural language text?
@mwermelinger,
allowCompoundWords
is now off by default. It has been the cause of many complaints.
I continue to strongly urge not setting allowCompoundWords
to true
.
I think a better practice is to just add the common compound words to a custom dictionary.
It is possible to define a custom compound dictionary:
cspell.config.yaml
dictionaryDefinitions:
- name: code-compounds
description: Custom Dictionary for compound words
path: ./compound-words.txt
addWords: true
languageSettings:
- caseSensitive: false
languageId: cpp,c,python,javascript
dictionaries:
- code-compounds
compound-words.txt
*code*
*error*
*errors*
*help*
+end
begin+
+middle+
array
Only words with *
or +
will be combined.
-
*
- optional compound -
+
- only part of a compound
Jason, thanks for the reply but I'm afraid I don't understand the approach of having to explicitly list the compound words. How would cSpell accept identifiers like dayTimeUserMessage
, unless we add all those (and many other) words to the dictionary? Seems a very labour intensive approach to add words to the dictionary as needed, unless I'm missing some point. Thanks in advance for any clarification.
Forget it. Senior moment: snake and camel case are not considered compound words.
snake and camel case are not considered compound words.
Exactly. The spell checker is able to split snake and camel case. It even will handle ERRORcode
and ERRORCode
. With a identifier like ERRORCode
it will try both (ERRORC
, ode
) and (ERROR
, code
). It will handle IFrame
, but not iframe
.
Using the compound syntax above the following is considered correct:
-
errorcode
,codeerrors
,errorserrorerrros
,beginend
,beginmiddleend
,begincode
Not accepted:
-
codebegin
,endcode
,enderror