vscode-spell-checker C/C++: Some misspelled words are not detected

Misspelled words which are not detected: avalible, handeled, evalulated, deciced, pressent, senting.

May 21 '19 21:05 ambrop72

It detected those words for me.

I searched all the dictionaries, the words were not found. What programming language are you using?

May 22 '19 06:05 Jason3S

Hi, thanks for looking into this. I'm using C++ (and the official C++ extension). I didn't do any special setup of the spell checker other than which file types are checked and adding some words to the workspace dictionary (definitely not these ones, I will check).

May 22 '19 06:05 ambrop72

It is because most people while programming in c++ glue words together: errorhandler, to account for that, the spell checker allows for compound words. Your examples include multiple valid words: dec°iced, press°ent, han°deled

May 23 '19 13:05 Jason3S

I to do not think the way it currently works is ideal.

The plan is to change C/C++ compound matching to match against noun compounds instead of compounds made up of all words. errorcode, resturncode, htmlelement, messagehandler, errormessage would all be the kinds of stuff it would think is correct. This would help with suggestions as well. Things like noun{1,3} or (verb)(noun){,3}.

Jun 04 '19 15:06 Jason3S

I am running into this issue during code reviews and it causes quite a bit of grief. Specifically, words like evalute and GetMsgSrollTime (should be GetMsgScrollTime) are not being detected.

Many developers use a naming convention to separate individual words in an identifier name (e.g. camelCase, PascalCase, and snake_case). It would be great if the extension could take advantage of this to check for misspelled words. Unfortunately, it is difficult to predict which naming convention a developer is using. Therefore, an option to control how the extension parses compound words could work well for this issue. The option would have a checklist of common compound word naming conventions (including compound words using all lowercase letters) that the extension would know to treat as compound words. If camelCase is enabled, evalUte would be okay. If camelCase is disabled, evalUte would be treated as one word, "evalute," and would be incorrect. If alllowercase is enabled, evalute would be okay. If alllowercase is disabled, evalute would be incorrect. I think you get the picture.

There's also the case where any any naming convention could be used (non-common ones). Considering something like eVaLUtE, if any naming convention is enabled, the extension would behave as it does today (does not detect individual words based on case changes). The extension would interpret that word as eVaL + UtE. I hope it's obvious that nobody would think that eVaLUtE is correctly spelled, since humans follow patterns instead of chaos.

Sep 19 '19 16:09 kkaja123

I see the same with Python. For example, "singal" is not detected, presumably because it's "sing" + "al".

I thought that the cSpell.allowCompoundWords would control this behavior, but apparently not...

Dec 17 '19 03:12 kit1980

@kit1980 you are right, it is because allowCompoundWords is turned on for Python and C/C++.

To turn off allowCompoundWords for a language, you need to override it at the language level:

The following will turn off compound word matching for C/C++ and Python:

    "cSpell.languageSettings": [
        {
            "languageId": "c,cpp,python",
            "allowCompoundWords": false
        }
    ]

Dec 23 '19 14:12 Jason3S

@kit1980 you are right, it is because allowCompoundWords is turned on for Python and C/C++.

To turn off allowCompoundWords for a language, you need to override it at the language level:

The following will turn off compound word matching for C/C++ and Python:
    "cSpell.languageSettings": [
        {
            "languageId": "c,cpp,python",
            "allowCompoundWords": false
        }
    ]

Thanks for the fix! This should really be the default behavior IMHO (or at least the default behavior needs some case-matching refinement). Add me to the list of people who pushed code with typos because of this.

Jan 21 '20 22:01 jharrang

My plan is to turn allowCompoundWords off by default. To do that, I have been working on a way to define compoundable words. It is a simple syntax:

error*
*code
+infix+
+msg

* - optional compound + - required compound

With this definition valid words are:

error, code, errorcode, errormsg, errorinfixmsg

The follow are some of the not allowed words:

codemsg, msg

Jan 22 '20 10:01 Jason3S

Is this the reason why servie isn't correctly checked? In a plain text file:

https://user-images.githubusercontent.com/30010/156350957-74843052-b408-4481-9bae-6f75b3ae7aa9.mp4

Mar 02 '22 11:03 PEZ

Is this the reason why servie isn't correctly checked?

Yes it was. Sorry for the noice.

Mar 02 '22 11:03 PEZ

@PEZ,

You can use the cspell trace command to check.

npx cspell trace --language-id=cpp servie

Mar 02 '22 11:03 Jason3S

Ah. sweet!

Mar 02 '22 11:03 PEZ

The setting for compound words tell us that it might make misspelled words look correct. It would be nice to also tell us the setting can be disabled per language. I was getting frustrated with all the undetected typos in Markdown, like insructions, but disabling compound words for Markdown helps a lot. I'd rather have false positives (flagged correct word) than false negatives (misspelled word not flagged). Is there also a way to disable in code comments, i.e. compound words would only be allowed in code, not in natural language text?

Feb 06 '23 13:02 mwermelinger

@mwermelinger,

allowCompoundWords is now off by default. It has been the cause of many complaints.

I continue to strongly urge not setting allowCompoundWords to true.

I think a better practice is to just add the common compound words to a custom dictionary.

It is possible to define a custom compound dictionary:

cspell.config.yaml

dictionaryDefinitions:
  - name: code-compounds
    description: Custom Dictionary for compound words
    path: ./compound-words.txt
    addWords: true

languageSettings:
  - caseSensitive: false
    languageId: cpp,c,python,javascript
    dictionaries:
      - code-compounds

compound-words.txt

*code*
*error*
*errors*
*help*
+end
begin+
+middle+
array

Only words with * or + will be combined.

* - optional compound
+ - only part of a compound

Feb 06 '23 14:02 Jason3S

Jason, thanks for the reply but I'm afraid I don't understand the approach of having to explicitly list the compound words. How would cSpell accept identifiers like dayTimeUserMessage, unless we add all those (and many other) words to the dictionary? Seems a very labour intensive approach to add words to the dictionary as needed, unless I'm missing some point. Thanks in advance for any clarification.

Feb 06 '23 15:02 mwermelinger

Forget it. Senior moment: snake and camel case are not considered compound words.

Feb 06 '23 15:02 mwermelinger

snake and camel case are not considered compound words.

Exactly. The spell checker is able to split snake and camel case. It even will handle ERRORcode and ERRORCode. With a identifier like ERRORCode it will try both (ERRORC, ode) and (ERROR, code). It will handle IFrame, but not iframe.

Using the compound syntax above the following is considered correct:

errorcode, codeerrors, errorserrorerrros, beginend, beginmiddleend, begincode

Not accepted:

codebegin, endcode, enderror

Feb 06 '23 15:02 Jason3S