sublime_text icon indicating copy to clipboard operation
sublime_text copied to clipboard

Regular expressions in setting "ignored_words"

Open gerardroche opened this issue 9 years ago • 6 comments

Would be nice to able to use regular expressions in the ignored_words setting.

re: http://stackoverflow.com/questions/28677423/regular-expressions-in-ignored-words-in-sublime-text-3-spell-check

gerardroche avatar Aug 01 '16 16:08 gerardroche

I don't believe hunspell supports regular expressions

wbond avatar Aug 01 '16 20:08 wbond

I'm not entirely sure I understand your answer @wbond, could I trouble you to clarify please? My understanding is that ST must choose what words to pass to hunspell for spellchecking, because it is possible to configure spell checking by scope selector, which hunspell would know nothing about. Therefore, surely ST could check each word against the regular expression to decide whether or not to get hunspell to validate it...

Unless my idea of how the spell checking works is way off... ;)

keith-hall avatar Aug 03 '16 08:08 keith-hall

+1

It would also be useful for: https://forum.sublimetext.com/t/how-to-ignore-words-starting-with-numbers-on-the-spellchecking/22755

How to ignore words starting with numbers on the spellchecking?

Some words just start with a number as:

Took 0:00:01,91 seconds to run this script.

[Finished in 2.7s]

Which sublime is marking as spelling error. On eclipse there is a option to ignore words containing numbers. Is there such approach for Sublime Text?

evandrocoan avatar Sep 14 '16 16:09 evandrocoan

@keith-hall Currently Hunspell itself deals with the ignored words. Thus, in essence I was saying there isn't a simple way with the existing tools/implementation to just enable regex-based ignores.

In terms of how we feed text to Hunspell, we don't run regexes on the contents of the buffer, but use a very simple (fast) character scanner and then an optimized unicode lookup table to determine what category a character belongs to. We ignore things like punctuation and other items that don't appear to be words. The "tokens" of a file already have scope from the syntax, so that is a "free" filtering option.

As with many other aspects of the core editing experience in ST, it is heavy focused on performance.

I'm not saying it isn't possible to do (obviously it is), but it would be a more involved change to refactor how we deal with spelling, and probably relatively low in the priority list right now.

wbond avatar Sep 14 '16 20:09 wbond

Any chance the scanner could (possibly optionally) ignore words containing numbers at least? That doesn't seem like it would significantly slow things down. At least not to the extent that Regexing would. (Chris)

TheChrisPratt avatar May 08 '19 17:05 TheChrisPratt

Is there any way this could be prioritized?* Hopefully, since 2016 CPU speeds have advanced enough to minimize the performance concerns re: regexes.

In particular, spellcheck false negatives can be so plentiful and (thus) distracting in some languages as to make this otherwise helpful feature a hindrance.

A great example is LaTeX, where 90% of "words" in code show up as incorrect. Even if you whitelist every command, environment, and argument name, nothing can be done about measurements, which need to be decimal values followed immediately (without spaces) by a unit (e.g. cm, in, pt, etc). This causes sublime to register the entire string (e.g. 14pt) as a spell-checkable word, when there are obviously infinite possible strings of the same form.

* As an alternative, one thing that might be easier to implement: allowing a plugin to serve as the spellcheck engine (i.e. in lieu of hunspell).

Another idea (admittedly beyond the scope of this issue): It would be really powerful if syntax files could inform sublime as to if certain text is a keyword or literal. This could be used to inform if spellcheck can be run on a word.

mvastola avatar Aug 15 '22 00:08 mvastola

I'd love for SublimeHQ to take another look at this. I don't need arbitrary regular expression ignoring, but ignoring words with numbers in them would be a big improvement that should be easy to implement.

Sublime already ignores words that have capital letters. For example, helloAwd isn't marked as misspelled, so I don't know why hello123 is. (This is Sublime functionality; hunspell natively marks them both as incorrect.)

MatthiasPortzel avatar Nov 09 '23 16:11 MatthiasPortzel