Questions regarding ZSTD dictionary training.

Open DuckersMcQuack opened this issue 4 months ago • 1 comments

1: Are there pre-trained dictionaries i can acquire somewhere? 2: What's the ratio of dictionary size and ram usage if there is a "standard ratio"? 3: I setup my zstd:8 to use forced compressions on every fille, and had copilot make a script that takes every larger file to split them into 128KB chunks for the training to train on each chunk, as even if first block doesn't have equal compression ratio, i wanted to take huge game files split into 128KB chunks, and train on each one for every ".bundle" file for instance to achieve the best compression possible. 4: What's the ratio of dictionary size vs ram usage? As i want as good compression as i can on every file i write. 5: Is there a way to parallel chunk training? As at least copilot stated that the training was singlethreaded only, and as i have a 5900x, i want reading and writing to asynchronously compress/decompress files much more effectively and faster.

Sep 01 '25 14:09 DuckersMcQuack

Update, I've so far made a few database files with copilots help making a script that makes a sql database of every unique pattern of a few filetypes, and they are roughly 30GB each, and question now is: How big "dataset" of unique patterns is "enough" to achieve any sized dictionary? Or rather, how many patterns can a 128KB dictionary fit when training the standard sized dictionary?

Sep 04 '25 01:09 DuckersMcQuack