vscode-spell-checker icon indicating copy to clipboard operation
vscode-spell-checker copied to clipboard

Ignore accents when spell checking

Open cristovao-trevisan opened this issue 6 years ago • 26 comments

It would be nice to have a setting which tells the dictionary to ignore (or transform) special characters (and accents).

Most programming languages support only ASCII variable naming, for example the word "actualization" is "atualização" in portuguese (note that çã are both special characters).

The check gives an error if I write atualizacao, but it should work.

Maybe making avaliable a tranform pattern would also be a good idea, but I don't know how this is implemented, so...

cristovao-trevisan avatar Dec 27 '17 16:12 cristovao-trevisan

You make a really good point. I'll have to think about how this could be done.

Jason3S avatar Dec 29 '17 10:12 Jason3S

I agree, this is cumberstone for multiple languages. I think that it would be easiest to include another plugin, e.g. french-no-accents and it could be configured as any other language.

Mangatt avatar Oct 30 '18 10:10 Mangatt

And it is really easy to feed dictionary through transform function for most affected languages.

Mangatt avatar Oct 30 '18 10:10 Mangatt

I am working on a way to do this.

Jason3S avatar Jun 15 '19 08:06 Jason3S

@Jason3S did you progress on this in any way?

I'm also interested in this feature and willing to help in any way I can!

lsfgrd avatar Sep 27 '19 14:09 lsfgrd

Same here @Jason3S if we can help, this is quite vital for Spanish for example.

eturino avatar May 18 '20 21:05 eturino

@eturino I agree. Removing accents becomes vital.

The current version of cspell doesn't support stripping accents. The next version, cspell v5 will support it.

Jason3S avatar May 19 '20 12:05 Jason3S

Hi! Any news here?

mhagnumdw avatar Jul 26 '21 14:07 mhagnumdw

@mhagnumdw,

Which language are you needing the most? The dictionary needs to be re-built to support this feature.

Jason3S avatar Aug 12 '21 16:08 Jason3S

@mhagnumdw,

Which language are you needing the most? The dictionary needs to be re-built to support this feature.

Hi! pt-BR

mhagnumdw avatar Aug 13 '21 11:08 mhagnumdw

es-ES here

eturino avatar Aug 13 '21 13:08 eturino

@mhagnumdw, Which language are you needing the most? The dictionary needs to be re-built to support this feature.

Hi! pt-BR

+1

Does that mean that there needs to be a new dictionary with all the words without accents?

lsfgrd avatar Aug 13 '21 19:08 lsfgrd

Does that mean that there needs to be a new dictionary with all the words without accents?

The existing dictionary will be re-built and the version bumped.

It will contains the words with and without accents.

The caseSensitive setting will be used to control the check.

See Spanish dictionary example: cspell-dicts/dictionaries/es_ES/tests

If caseSensitive is true, it mean that both case and accents are checked. If caseSensitive is false, it means that case does not matter and accents can be removed. Mixing accents is not allowed. For example:

  • café - is good.
  • cafe - is also fine.
  • cafë - is not correct.

Jason3S avatar Aug 13 '21 21:08 Jason3S

@eturino,

I have release Spanish V2. Please try it out.

Jason3S avatar Sep 03 '21 20:09 Jason3S

@lsfgrd , @mhagnumdw ,

I hae release Brazilian V2. Please try it out.

There are two modes:

  1. non-case sensitive (default) - words can be lower case and without accents.
  2. case sensitive - words must match case and accents.

Note: the wrong accent is not allowed. if é was expected, then e or é is allowed, but ë is not.

Jason3S avatar Sep 03 '21 21:09 Jason3S

Note: you cannot mix accents in a word. Either the entire word has the correct accents, or zero accents.

Jason3S avatar Sep 03 '21 21:09 Jason3S

It works great @Jason3S, thank you so much for this 😀

lsfgrd avatar Sep 04 '21 02:09 lsfgrd

works @Jason3S ! Cheers!

eturino avatar Sep 06 '21 16:09 eturino

It seems that this is already working for Spanish too. That said, the behavior is very confusing and I was about to report a bug:

  • by default incorrectly spelled words are considered right (camion and camión)
  • the option to toggle this is misleadingly named "case sensitive"
  • it is not documented in any place but here, AFAICS

At least the docstring for the setting should be more informative (it says "words must match case rules").

memeplex avatar Nov 01 '21 12:11 memeplex

@memeplex,

Thank you for the feedback. I can see how it is confusing and does not work as expected.

Jason3S avatar Nov 01 '21 13:11 Jason3S

Why not adding an alias for caseSensitive, something more general like strictSpelling or similar, and deprecating caseSensitive?

memeplex avatar Nov 30 '21 14:11 memeplex

From #1060, moving discussion to here:

I should close this issue since the spell checker already supports this feature.

Does it support it properly, so that context is considered, or is it just extra word list / collation that applies everywhere? Like I said in https://github.com/streetsidesoftware/cspell/issues/1060#issuecomment-1006197820:

However, these additional words must only be considered correct in code (specifically, in identifiers). It should NOT be accent-insensitive inside:

  • strings in code (very important! these may be displayed to the user!)
  • string values in data files like .json, .csv, .yml, etc (same as above)
  • documentation comments, e.g: jsdoc, xmldoc, /** ... */ etc, which should have proper accentuation
  • optionally, other comments (if the user wants to be more strict; by default it should be insensitive)
  • plain text parts of markup files like .md, .html, .xml
  • optionally, plain text file types, such as .txt, .log, etc (i.e., you should be able to override the setting for specific file types)

Cause if it doesn't distinguish between code and display text/strings, it might actually do more harm than good, since a misspelled word can end up being overlooked and displayed for end-users.

geekley avatar Jan 07 '22 01:01 geekley

At the moment, case and accent sensitivity is at the file level. Meaning, the entire file is checked for accent correctness or not. A future enhancement is to make the spell checker context aware (meaning it knows about strings vs code).

It is currently possible using languageSettings and overrides to target specific files to turn on / off checking for accent correctness. See https://github.com/streetsidesoftware/cspell/issues/1060#issuecomment-1007283031

Most language dictionaries are case sensitive by default. When you enable the dictionary for a file, the entire file is checked for accent correctness. You must explicitly turn off caseSensitive at a languageSettings level. This was done to avoid accidental assumption of correctness as mentioned above.

Jason3S avatar Jan 07 '22 11:01 Jason3S

I can't get this to work for me. Even with "cSpell.caseSensitive": false explicitly set in my vscode settings, I'm still getting corrections when I pass zero accents in accentuated words. e.g.: example settings

nataliafonseca avatar Mar 15 '22 13:03 nataliafonseca

@nataliafonseca,

It is not possible to force the setting at the global level. Please use languageSettings and overrides to target specific files to turn on / off checking for accent correctness. See https://github.com/streetsidesoftware/cspell/issues/1060#issuecomment-1007283031

I also advise not to use allowCompoundWords, it hides a lot of issues.

Jason3S avatar Mar 15 '22 15:03 Jason3S

@nataliafonseca,

It is not possible to force the setting at the global level. Please use languageSettings and overrides to target specific files to turn on / off checking for accent correctness. See streetsidesoftware/cspell#1060 (comment)

I also advise not to use allowCompoundWords, it hides a lot of issues.

Thank you so much! It's working perfectly now 😀

nataliafonseca avatar Mar 16 '22 01:03 nataliafonseca