vscode-spell-checker
vscode-spell-checker copied to clipboard
Ignore accents when spell checking
It would be nice to have a setting which tells the dictionary to ignore (or transform) special characters (and accents).
Most programming languages support only ASCII variable naming, for example the word "actualization" is "atualização" in portuguese (note that çã
are both special characters).
The check gives an error if I write atualizacao
, but it should work.
Maybe making avaliable a tranform pattern would also be a good idea, but I don't know how this is implemented, so...
You make a really good point. I'll have to think about how this could be done.
I agree, this is cumberstone for multiple languages. I think that it would be easiest to include another plugin, e.g. french-no-accents and it could be configured as any other language.
And it is really easy to feed dictionary through transform function for most affected languages.
I am working on a way to do this.
@Jason3S did you progress on this in any way?
I'm also interested in this feature and willing to help in any way I can!
Same here @Jason3S if we can help, this is quite vital for Spanish for example.
@eturino I agree. Removing accents becomes vital.
The current version of cspell
doesn't support stripping accents. The next version, cspell v5 will support it.
Hi! Any news here?
@mhagnumdw,
Which language are you needing the most? The dictionary needs to be re-built to support this feature.
@mhagnumdw,
Which language are you needing the most? The dictionary needs to be re-built to support this feature.
Hi! pt-BR
es-ES here
@mhagnumdw, Which language are you needing the most? The dictionary needs to be re-built to support this feature.
Hi! pt-BR
+1
Does that mean that there needs to be a new dictionary with all the words without accents?
Does that mean that there needs to be a new dictionary with all the words without accents?
The existing dictionary will be re-built and the version bumped.
It will contains the words with and without accents.
The caseSensitive
setting will be used to control the check.
See Spanish dictionary example: cspell-dicts/dictionaries/es_ES/tests
If caseSensitive
is true
, it mean that both case and accents are checked. If caseSensitive
is false
, it means that case does not matter and accents can be removed. Mixing accents is not allowed. For example:
-
café
- is good. -
cafe
- is also fine. -
cafë
- is not correct.
@eturino,
I have release Spanish V2. Please try it out.
@lsfgrd , @mhagnumdw ,
I hae release Brazilian V2. Please try it out.
There are two modes:
- non-case sensitive (default) - words can be lower case and without accents.
- case sensitive - words must match case and accents.
Note: the wrong accent is not allowed. if é
was expected, then e
or é
is allowed, but ë
is not.
Note: you cannot mix accents in a word. Either the entire word has the correct accents, or zero accents.
It works great @Jason3S, thank you so much for this 😀
works @Jason3S ! Cheers!
It seems that this is already working for Spanish too. That said, the behavior is very confusing and I was about to report a bug:
- by default incorrectly spelled words are considered right (camion and camión)
- the option to toggle this is misleadingly named "case sensitive"
- it is not documented in any place but here, AFAICS
At least the docstring for the setting should be more informative (it says "words must match case rules").
@memeplex,
Thank you for the feedback. I can see how it is confusing and does not work as expected.
Why not adding an alias for caseSensitive
, something more general like strictSpelling
or similar, and deprecating caseSensitive
?
From #1060, moving discussion to here:
I should close this issue since the spell checker already supports this feature.
Does it support it properly, so that context is considered, or is it just extra word list / collation that applies everywhere? Like I said in https://github.com/streetsidesoftware/cspell/issues/1060#issuecomment-1006197820:
However, these additional words must only be considered correct in code (specifically, in identifiers). It should NOT be accent-insensitive inside:
- strings in code (very important! these may be displayed to the user!)
- string values in data files like .json, .csv, .yml, etc (same as above)
- documentation comments, e.g: jsdoc, xmldoc, /** ... */ etc, which should have proper accentuation
- optionally, other comments (if the user wants to be more strict; by default it should be insensitive)
- plain text parts of markup files like .md, .html, .xml
- optionally, plain text file types, such as .txt, .log, etc (i.e., you should be able to override the setting for specific file types)
Cause if it doesn't distinguish between code and display text/strings, it might actually do more harm than good, since a misspelled word can end up being overlooked and displayed for end-users.
At the moment, case and accent sensitivity is at the file level. Meaning, the entire file is checked for accent correctness or not. A future enhancement is to make the spell checker context aware (meaning it knows about strings vs code).
It is currently possible using languageSettings
and overrides
to target specific files to turn on / off checking for accent correctness. See https://github.com/streetsidesoftware/cspell/issues/1060#issuecomment-1007283031
Most language dictionaries are case sensitive by default. When you enable the dictionary for a file, the entire file is checked for accent correctness. You must explicitly turn off caseSensitive
at a languageSettings
level. This was done to avoid accidental assumption of correctness as mentioned above.
I can't get this to work for me. Even with "cSpell.caseSensitive": false
explicitly set in my vscode settings, I'm still getting corrections when I pass zero accents in accentuated words. e.g.:
@nataliafonseca,
It is not possible to force the setting at the global level. Please use languageSettings
and overrides
to target specific files to turn on / off checking for accent correctness. See https://github.com/streetsidesoftware/cspell/issues/1060#issuecomment-1007283031
I also advise not to use allowCompoundWords
, it hides a lot of issues.
@nataliafonseca,
It is not possible to force the setting at the global level. Please use
languageSettings
andoverrides
to target specific files to turn on / off checking for accent correctness. See streetsidesoftware/cspell#1060 (comment)I also advise not to use
allowCompoundWords
, it hides a lot of issues.
Thank you so much! It's working perfectly now 😀