💡: Allow prefix+ to be completed by words in another dictionary
Problem
I would like to allow lib+ as a custom prefix that must be followed by any other valid English word.
Solution
At the moment this doesn't seem possible - if I add
lib+
apple
to my custom dictionary, the libapple is valid but liborange is invalid. I would like for lib+ to also match liborange (and to not need to add apple to my dictionary.) I would like for libasdf to continue to be invalid.
Alternatives
No response
Additional Context
No response
Code of Conduct
- [x] I agree to follow this project's Code of Conduct
@jwhitaker-gridcog,
Thank you for the suggestion.
Supported Compound Syntax
Custom dictionaries / word lists already support being able to specify compounding words. There are two special symbols * and +:
| symbol | meaning |
|---|---|
* |
Optional compounding - can stand alone. |
+ |
Compound Required - cannot stand alone. |
Here is a word list that uses them: software-terms/src/coding-compound-terms.txt.
Example:
words: i*, *object*, *reference*, un+;
Some Valid words: objectreference, referenceobject, ireferenceobject, objectobject, unreference, unobject.
Some unknown words: referenceiobject, un, iun, objectunreference .
Cross Dictionary Conpounding
It is unlikely that cspell will support that. It has been asked for a few times. There are two main reasons behind not supporting it:
- Performance - It makes looking up words very expensive.
- False Negatives - Words that are clear misspellings are considered correct. See #317
From my perspective 2. is not a concern. If I add "lib+" to my dictionary I would be aware that the result will be "libauthorization", "libencoding", "libpin", "libpen", and "libpun" all being accepted. I'm not relying on cspell for code correctness, that's what my type checker / compiler is for. :)
On performance, out of interest would it have any different characteristics than adding "lib+" to the built in en dictionary?
It is easy to think of cases where you want them combined, it the unexpected combinations that are an issue. librare is that a desired word or a misspelling of library?
Why not build your own word list?
There are also other reporting modes that might make more sense in your case: --report typos or --report simple.
The time to Look up words isn't based upon the size of the dictionary. It is the length of the word being looked up, the number of dictionaries, and the number of compound branches. Internally, every dictionary is a Trie like DAG. A normal lookup is a single path through the DAG. Compound words means there are multiple possible paths through the DAG. An A* like algorithm is used to look up words and make suggestions.
It is easy to think of cases where you want them combined, it the unexpected combinations that are an issue. librare is that a desired word or a misspelling of library?
I am aware this rule will result in false positives, I expected that would be the behaviour of a rule lib+ so would not be surprised that 'librare' is now considered valid. For context: https://packages.debian.org/search?keywords=libr&searchon=names&suite=stable§ion=all :)
'user story': i am using cspell in my IDE as a best-effort helper, I am not fussed if it misses things. false negatives are a lot more annoying than false positives for me because they result in maintenance for my dictionary and a pull request to our shared code base containing it. as such i'd be a little bit surprised if it had false positives out of the box, but if i configured them myself i wouldn't mind at all.
Why not build your own word list?
For what i want i'd need to fork the en-au word list to do this which doesn't really appeal. :)
@jwhitaker-gridcog,
Which IDE are you using? There might be some settings that can hide most issues.
There are a few settings:
allowCompoundWords- turns on compounding within all dictionaries. A bit extreme.unknownWords- Usingreport-common-typosmight be what you are looking for.
As far as cross dictionary compounding, it is unlikely to be supported.