cspell-dicts icon indicating copy to clipboard operation
cspell-dicts copied to clipboard

Swapping HTML character entities before spell checking

Open nschonni opened this issue 10 months ago • 1 comments

Was running into some split words when spell checking some French words, since they used HTML entities in some escaped code blocks. Was thinking that at least for the letter based entities https://html.spec.whatwg.org/multipage/named-characters.html#named-character-references it might makes sense to setup small repMap array for the HTML dictionary. Thought I'd make an issue before trying to do any coding to see if it makes sense, or would even cascade out once referencing the html dictionary in a downstream cspell.json.

nschonni avatar Feb 09 '25 20:02 nschonni

@nschonni,

You are seeing one of the current limitations of the spell checker. repMap happens too late, just before the word is checked against the dictionary that defines it.

What is needed in this case is a preprocessing step, one that transforms the document before it is spell checked. That is the purpose of: cspell/rfc/rfc-0003 parsing

Jason3S avatar Feb 10 '25 07:02 Jason3S