cspell icon indicating copy to clipboard operation
cspell copied to clipboard

💡: Allow prefix+ to be completed by words in another dictionary

Open jwhitaker-gridcog opened this issue 5 months ago • 6 comments

Problem

I would like to allow lib+ as a custom prefix that must be followed by any other valid English word.

Solution

At the moment this doesn't seem possible - if I add

lib+
apple

to my custom dictionary, the libapple is valid but liborange is invalid. I would like for lib+ to also match liborange (and to not need to add apple to my dictionary.) I would like for libasdf to continue to be invalid.

Alternatives

No response

Additional Context

No response

Code of Conduct

  • [x] I agree to follow this project's Code of Conduct

jwhitaker-gridcog avatar Jul 09 '25 02:07 jwhitaker-gridcog

@jwhitaker-gridcog,

Thank you for the suggestion.

Supported Compound Syntax

Custom dictionaries / word lists already support being able to specify compounding words. There are two special symbols * and +:

symbol meaning
* Optional compounding - can stand alone.
+ Compound Required - cannot stand alone.

Here is a word list that uses them: software-terms/src/coding-compound-terms.txt.

Example:

words: i*, *object*, *reference*, un+;

Some Valid words: objectreference, referenceobject, ireferenceobject, objectobject, unreference, unobject. Some unknown words: referenceiobject, un, iun, objectunreference .

Cross Dictionary Conpounding

It is unlikely that cspell will support that. It has been asked for a few times. There are two main reasons behind not supporting it:

  1. Performance - It makes looking up words very expensive.
  2. False Negatives - Words that are clear misspellings are considered correct. See #317

Jason3S avatar Jul 12 '25 05:07 Jason3S

From my perspective 2. is not a concern. If I add "lib+" to my dictionary I would be aware that the result will be "libauthorization", "libencoding", "libpin", "libpen", and "libpun" all being accepted. I'm not relying on cspell for code correctness, that's what my type checker / compiler is for. :)

jwhitaker-gridcog avatar Jul 13 '25 23:07 jwhitaker-gridcog

On performance, out of interest would it have any different characteristics than adding "lib+" to the built in en dictionary?

jwhitaker-gridcog avatar Jul 13 '25 23:07 jwhitaker-gridcog

It is easy to think of cases where you want them combined, it the unexpected combinations that are an issue. librare is that a desired word or a misspelling of library?

Why not build your own word list?

There are also other reporting modes that might make more sense in your case: --report typos or --report simple.

The time to Look up words isn't based upon the size of the dictionary. It is the length of the word being looked up, the number of dictionaries, and the number of compound branches. Internally, every dictionary is a Trie like DAG. A normal lookup is a single path through the DAG. Compound words means there are multiple possible paths through the DAG. An A* like algorithm is used to look up words and make suggestions.

Jason3S avatar Jul 17 '25 07:07 Jason3S

It is easy to think of cases where you want them combined, it the unexpected combinations that are an issue. librare is that a desired word or a misspelling of library?

I am aware this rule will result in false positives, I expected that would be the behaviour of a rule lib+ so would not be surprised that 'librare' is now considered valid. For context: https://packages.debian.org/search?keywords=libr&searchon=names&suite=stable&section=all :)

'user story': i am using cspell in my IDE as a best-effort helper, I am not fussed if it misses things. false negatives are a lot more annoying than false positives for me because they result in maintenance for my dictionary and a pull request to our shared code base containing it. as such i'd be a little bit surprised if it had false positives out of the box, but if i configured them myself i wouldn't mind at all.

Why not build your own word list?

For what i want i'd need to fork the en-au word list to do this which doesn't really appeal. :)

jwhitaker-gridcog avatar Jul 17 '25 09:07 jwhitaker-gridcog

@jwhitaker-gridcog,

Which IDE are you using? There might be some settings that can hide most issues.

There are a few settings:

  • allowCompoundWords - turns on compounding within all dictionaries. A bit extreme.
  • unknownWords - Using report-common-typos might be what you are looking for.

As far as cross dictionary compounding, it is unlikely to be supported.

Jason3S avatar Jul 21 '25 04:07 Jason3S