vscode-spell-checker icon indicating copy to clipboard operation
vscode-spell-checker copied to clipboard

Spellcheck only comments

Open haraldF opened this issue 6 years ago • 27 comments

Would be nice to have an option to spell check only comments, independent of the language.

This allows documentation writers to get spell checking without drowning in warnings from badly written code.

haraldF avatar Jul 17 '17 08:07 haraldF

There is a way to do it for each language. But it has to be done for each language. What languages are you using? I'll see if I can give you an example.

The idea to only include text for matching via a regex.

Here is an example that can be added to your user or workspace settings.json file.

    "cSpell.languageSettings": [
        // This one works with python
        {
            "languageId": "python",
            "includeRegExpList": [
                "/#.*/",
                "/('''|\"\"\")[^\\1]+?\\1/g"
            ]
        },
        // this one works with javascript, C, typescript, etc, 
        // but you need to copy it and change the language id.
        {
            "languageId": "javascript",
            "includeRegExpList": [
               "CStyleComment"
            ]
        }
    ]

This is a version that works in a cSpell file:

    "languageSettings": [
        // This one works with python
        {
            "languageId": "python",
            "includeRegExpList": [
                "/#.*/",
                "/('''|\"\"\")[^\\1]+?\\1/g"
            ]
        },
        // this one works with javascript, C, typescript, etc, 
        // but you need to copy it and change the language id.
        {
            "languageId": "javascript",
            "includeRegExpList": [
               "CStyleComment"
            ]
        }
    ]

Jason3S avatar Jul 17 '17 18:07 Jason3S

thanks Jason for the quick reply, I'm well aware that I can maintain such a file myself for the languages that I'm using, I just wonder whether it would be smarter to support such a feature out of the box for the convenience of documentation authors. If it's too complex, feel free to close this bug report.

haraldF avatar Jul 22 '17 15:07 haraldF

I agree, it would be useful.

I don't have an easy way to detect comments in each language. At the moment, the only way is to add include expressions.

I have been playing with the idea of reading TextMate colorizer files and trying to glean the meaning from those. But I don't have a lot of time to work on this.

Jason3S avatar Jul 23 '17 14:07 Jason3S

@Jason3S can you provide language-specific expression for golang? thanks Also +1 for this feature.

raffaelespazzoli avatar May 04 '19 01:05 raffaelespazzoli

It might be worth noting that combining the above with the strings expression suggested in #116 and in the documentation worked well. Also, it seems language ids can be listed as a comma-separated list rather than copying the whole block (not in the docs, but seen in #116 and seems to work when testing). All together, this worked well enough to keep me using Code Spell Checker 😄

"cSpell.languageSettings": [
    // This one works with Python
    {
        "languageId": "python",
        "includeRegExpList": [
            "/#.*/",
           "/('''|\"\"\")[^\\1]+?\\1/g",
            "strings"
        ]
    },
    // This one works with JavaScript, Typescript, etc
    {
        "languageId": "javascript,typescript",
        "includeRegExpList": [
            "CStyleComment",
            "strings"
        ]
    },
    // Use with cpp or c files
    {
        "languageId": "cpp,c",
        // Turn off compound words, because it is only checking strings.
        "allowCompoundWords": false,
        // Only check comments and strings
        "includeRegExpList": [
            "CStyleComment",
            "string"
        ],
        // Exclude includes, because they are also strings.
        "ignoreRegExpList": [
            "/#include.*/"
        ]
    }
]

jmcker avatar Jun 13 '19 16:06 jmcker

@jmcker great example.

Jason3S avatar Jun 15 '19 08:06 Jason3S

@Jason3S @jmcker thanks for these snippet. I'm really bad at regex and have been trying to remove spell checking in inline code (between backticks) in comments. Any idea on how to do that ? thanks :)

YannDubs avatar Sep 15 '19 22:09 YannDubs

Isn't it possible to use the syntax highlighting category determined by grammar for the current language (see https://code.visualstudio.com/api/language-extensions/syntax-highlight-guide) in order to activate/deactivate spell checking rules in a more general way? This seems more convenient than repeating regular expressions for each language, although it's cool to have the regex rules for corner cases.

memeplex avatar Oct 06 '19 22:10 memeplex

Isn't it possible to use the syntax highlighting category determined by grammar for the current language (see https://code.visualstudio.com/api/language-extensions/syntax-highlight-guide) in order to activate/deactivate spell checking rules in a more general way? This seems more convenient than repeating regular expressions for each language, although it's cool to have the regex rules for corner cases.

It isn't possible for an extension to get access to the syntax-highlighting. I do have a long term plan on how to do this. It is more of a time constraint.

Jason3S avatar Oct 23 '19 11:10 Jason3S

@Jason3S @jmcker thanks for these snippet. I'm really bad at regex and have been trying to remove spell checking in inline code (between backticks) in comments. Any idea on how to do that ? thanks :)

Please open a new issue with your exact challenge. Include some examples and refer to this issue. I will see if I can help you out.

Jason3S avatar Oct 23 '19 11:10 Jason3S

@YannDubs See comment above.

Jason3S avatar Oct 23 '19 11:10 Jason3S

golang

@raffaelespazzoli Here is my cSpell settings for golang (add to User settings. json):

"cSpell.languageSettings": [
        // GoLang
        // Set what strings to check (see https://github.com/streetsidesoftware/vscode-spell-checker/issues/107)
        {
            "languageId": "go",
            // Turn off compound words, because it is only checking strings.
            "allowCompoundWords": false,
            // Only check comments and strings
            "includeRegExpList": [
                "CStyleComment",
                "string"
            ],
            // Exclude imports, because they are also strings.
            "ignoreRegExpList": [
                // ignore mulltiline imports
                "import\\s*\\((.|[\r\n])*?\\)",
                // ignore single line imports
                "import\\s*.*\".*?\""
            ],
        }
    ]

Also you can fork it here https://gist.github.com/r3code/21d1e9a3f862ad865808f07225b59068

r3code avatar Jul 28 '20 08:07 r3code

In case you need the same in Haskell, I'm not great at regex but trial and error led me to something that does the job:

{
            "languageId": "haskell",
            "includeRegExpList": [
                "/--.*/",
                "{-(.|\n)*?-}",
                "string"
            ]
        }

harisont avatar Dec 30 '20 17:12 harisont

It isn't possible for an extension to get access to the syntax-highlighting. I do have a long term plan on how to do this. It is more of a time constraint.

@memeplex I had the same question. Coming from Textmate with and having written a lot of custom bundles, I was really surprised that VSCode does not make this available to extension authors. Here's an issue that explains the issue and tracks the request: https://github.com/microsoft/vscode/issues/580

ryanfitzer avatar Mar 02 '21 22:03 ryanfitzer

I got to know how to spellcheck only comments and strings by reading this issue. Thank you very much!

But I think it should be more clear in the README. E.G. it doesn't mention the option "includeRegExpList".

Also, the include example in the README explicitly tells that only comments and block strings will be checked for spelling, but it does not work very well (look below).

image

Diogo-Rossi avatar Jul 16 '21 00:07 Diogo-Rossi

@Diogo-Rossi, Looks like the regexp was wrong in the example. It was too greedy and also matched the expression itself.

It should be:

# cSpell:includeRegExp #.*
# cSpell:includeRegExp /(["]{3}|[']{3})[^\1]*?\1/g
# only comments and block strings will be checked for spelling.
def sum_it(self, seq):
    """This is checked for spelling"""
    variabele = 0
    alinea = 'this is not checked'
    for num in seq:
        # The local state of 'value' will be retained between iterations
        variabele += num
        yield variabele

For Python, you can now use:

# cspell:includeRegExp comments

See: cspell-dicts/cspell-ext.json at main · streetsidesoftware/cspell-dicts

Jason3S avatar Sep 29 '21 14:09 Jason3S

I added this for shellscript and python. It still needs work. Since my code has also commands as strings, I include here only strings that start with capital letter, which in most cases are English sentences.

    "cSpell.languageSettings": [
      {
        "languageId": "shellscript",
        "includeRegExpList": [
            "/#.*/",
            "/('|\")[A-Z][^\\1]+?\\1/g",
        ]
      },
      {
        "languageId": "python",
        "includeRegExpList": [
          "/#.*/",
          "/('|\")[A-Z][^\\1]+?\\1/g",
          "/('''|\"\"\")[^\\1]+?\\1/g",
        ]
      },
    ],

ErezArbell avatar Oct 06 '22 10:10 ErezArbell

@ErezArbell,

I suggest testing out your regular expressions on https://regex101.com Use the JavaScript ECMAScript.

The spell checker has a feature you can use to see your expressions:

  1. Enable the Experimental Regexp View image
  2. Click on image

You should see all the patterns: image

Jason3S avatar Oct 06 '22 11:10 Jason3S

Please note: this expression will only match ALL-CAPS WORDS.

"/('|\")[A-Z][^\\1]+?\\1/g"

Jason3S avatar Oct 06 '22 11:10 Jason3S

No. It will match strings whose first letter is capital. as I wrote above, this is on purpose since there are strings that contain commands.

ErezArbell avatar Oct 06 '22 17:10 ErezArbell

No. It will match strings whose first letter is capital. as I wrote above, this is on purpose since there are strings that contain commands.

My mistake.

Jason3S avatar Oct 06 '22 18:10 Jason3S

Does anyone have a solution for matching only the value in JSON objects? Eg, in {"description": "Catch spellin here"}, the key is typically part of an API contract and will be validated separately, but the value needs spell checking.

jace avatar Dec 13 '23 14:12 jace

This works for ignoring JSON object keys:

    {
      "languageId": "json,jsonc",
      "allowCompoundWords": false,
      "ignoreRegExpList": ["/\"[^\"]*\":/"]
    },

The default list of ignores doesn't appear to be overriden by this — they continue to be ignored.

jace avatar Dec 13 '23 15:12 jace

After some tweaking, I've found Python settings that work for me:

    {
      "languageId": "python",
      "allowCompoundWords": false,
      "includeRegExpList": ["comments", "string"],
      "ignoreRegExpList": [
        // Ignore single-quoted strings ('symbols' and '''embedded code like SQL''')
        "/'.*?'/g",
        "/'''.+?'''/gm",
        // Ignore code in braces in f-strings: f"...{code} ... {code}", f'...{code}'
        "/(?<=(?:f|rf|fr)(?:\"[^\"]*|'[^']*))\\{.*?\\}/g",
        // Ignore reStructuredText code samples (indented block after line ending with
        // `::`), but don't ignore `.. directive::`. If your documentation uses
        // `.. code-block:: lang`, remove the `(?<!...)`
        "/(?<!\\s*\\.\\..*)::$\\n+(\\s+).*\\n(?:^\\n|^\\1.*\\n)*/gm",
        // Ignore linter directive comments
        "/#\\s*(flake8:|isort:|noqa:|nosec\\s|pragma:|pylint:|pyright:|type:).*/i",
        // Ignore words within `backticks` or ``backticks``, used for references
        "/(`{1,2}).*?\\1/g",
        // Ignore reStructuredText parameter names and types
        "/:(param|type).*?:/"
      ]
    },

I haven't figured out how to ignore only the reStructuredText .. code-block:: <lang> directive.

jace avatar Dec 14 '23 10:12 jace

These are my settings: for python, C++ and C I have spell check only for comments and strings and for .json files only comments spell check (becaus e.g. for settings.json there are a lot of strings that raise plenty of messages in problems section.

        {
            "languageId": "python,cpp,c",
            "includeRegExpList": [
                // For Python
                "comments",
                "strings"
                // For C++ and C
                "CStyleComment",
                "string"
            ]
        },
        {
            "languageId": "jsonc", // (.json with comments) because most "commands" here are strings
            "includeRegExpList": [
                "CStyleComment",
            ]
        }
    ],

Notice how you can have just one block for all you languages (add more seperated by a comma) and you just add the language-specific way of telling it to check for comments ans strings (only the json case has different block because we don't want it to check for strings there).

konstabark avatar Mar 03 '24 00:03 konstabark

Not much advancement VSCode-side in providing access to the syntax tree, but there are a couple of developments to parse from a textmate or treesitter grammar that may improve performance and simplify development wrt regular expressions:

  • https://github.com/vsce-toolroom/vscode-textmate-languageservice
  • https://github.com/microsoft/vscode-anycode

memeplex avatar Mar 03 '24 16:03 memeplex