cspell-dicts
cspell-dicts copied to clipboard
Support both dictionaries dictionaries/en_US/en_US.trie.gz and additional_words.txt
Every times when there is a new entry word in dictionaries/en_US/src/additional_words.txt,
the dictionaries/en_US/en_US.trie.gz get rebuild and Git metadata and Git object database gets excessively inflated by 400KB due to binary file.
dictionaries/en_US/CHANGELOG.md | 12 ++
dictionaries/en_US/checksum.txt | 4 +-
dictionaries/en_US/en_US.trie.gz | Bin 401990 -> 401909 bytes
dictionaries/en_US/package.json | 2 +-
dictionaries/en_US/src/additional_words.txt | 2 +
Do you think cspell can support handling both dictionaries en_US.trie.gz and additional_words.txt for en_US?
In this way, the Git object database will not inflated quickly.
The same concept can applied to all other languages using *.trie.gz.
I understand Git object database size is not cspell dictionaries issue. But have an efficient Git objects has it own benefit.
./dictionaries/ar/src/additional_words.txt
./dictionaries/de_CH/src/additional_words.txt
./dictionaries/de_DE/src/additional_words.txt
./dictionaries/en_GB-MIT/src/additional_words.txt
./dictionaries/en_GB/src/additional_words.txt
./dictionaries/en_US/src/additional_words.txt
./dictionaries/es_ES/src/additional_words.txt
./dictionaries/fr_FR_90/src/additional_words.txt
./dictionaries/fr_FR/src/additional_words.txt
./dictionaries/nl_NL/src/additional_words.txt
./dictionaries/pt_BR/src/additional_words.txt
./dictionaries/python/src/additional_words.txt
./dictionaries/ru_RU/src/additional_words.txt
./dictionaries/sl_SI/src/additional_words.txt
./dictionaries/sv/src/additional_words.txt
@vikivivi,
You make a good point.
As far as cspell is concerned, the size of the dictionary doesn't matter, but the number of dictionaries does. So, I'm reluctant to "add" more dictionaries.
Back to the original problem, binary object taking up a lot of space.
Since trie files are text files, it might be worth it to just keep the .trie file instead of the .trie.gz and to compress the .gz during publication to npm. Using trie files will keep the object size smaller, even though they are bigger because it is possible to "diff" the files. .trie files are stored because they take a long time to build. .trie.gz files had been stored because they are smaller, but as you point out, since they are binary, in the long run, they take up more space.
Since
triefiles are text files, it might be worth it to just keep the.triefile instead of the.trie.gzand to compress the.gzduring publication tonpm.
.... because it is possible to "diff" the files.
I support this.
I'm going to close this, since moving to text files addresses the issue.