zstd
zstd copied to clipboard
Add compressionLevel parameter to ZDict_trainFromBuffer
Currently, ZDICT_trainFromBuffer (lib/dictBuilder/zdict.c) trains a dictionary using compression level ZSTD_CLEVEL_DEFAULT (3). I would like to be able to pass in a compression level, and from what I can tell there's no way to do this currently. Would it be possible to add functionality to do so?
We can't do that because ZDICT_trainFromBuffer is part of our ABI stable interface.
I'd recommend trying the ZDICT_optimizeTrainFromBuffer_cover or the ZDICT_optimizeTrainFromBuffer_fastCover function, depending on if you care about speed of dictionary training. You can tune the steps parameter to make it faster.
Under the hood ZDICT_trainFromBuffer is just calling ZDICT_optimizeTrainFromBuffer_fastCover:
https://github.com/facebook/zstd/blob/466e13f7225387a4276738f421378396595a5d4a/lib/dictBuilder/zdict.c#L1101-L1117
Longer term, we want to improve our dictionary builder interface, and provide a richer interface as part of our ABI stable interface. But we haven't found the time to do it, and don't think that the current unstable interface is good enough to be promoted to stable.
We can't do that because
ZDICT_trainFromBufferis part of our ABI stable interface.
Currently stable API has two dictionary training functions:
ZDICT_trainFromBuffer()https://github.com/facebook/zstd/blob/43f21a600ec431aa615b09868f1f586b949607fb/lib/dictBuilder/zdict.c#L1101-L1102ZDICT_finalizeDictionary()https://github.com/facebook/zstd/blob/43f21a600ec431aa615b09868f1f586b949607fb/lib/dictBuilder/zdict.c#L852-L855
If don't break the stable API, you may let ZDICT_finalizeDictionary() do this work:
if (customDictContent == NULL && dictContentSize == 0) {
// No custom dictionary, just train the samples with specified level.
}
If so, maybe ZDICT_trainFromBuffer() can be marked as deprecated.
Thanks both, much appreciated!
Since ZDICT_optimizeTrainFromBuffer_fastCover isn't in zdict.h like ZDICT_trainFromBuffer is, how should I #include it in my code?
Also, would using ZDICT_finalizeDictionary in this way generate the same dictionary as ZDICT_trainFromBuffer? I'm not sure exactly what the standard usage of ZDICT_finalizeDictionary is.
Since ZDICT_optimizeTrainFromBuffer_fastCover isn't in zdict.h like ZDICT_trainFromBuffer is, how should I #include it in my code?
If use experimental API, define this macro, then it's available in zdict.h. (define it before include zdict.h)
#define ZDICT_STATIC_LINKING_ONLY
Also, would using ZDICT_finalizeDictionary in this way generate the same dictionary as ZDICT_trainFromBuffer? I'm not sure exactly what the standard usage of ZDICT_finalizeDictionary is.
Not at present. It's a feature request.
Please reopen this issue. It has been assigned to a core developer, may be solved in future.
I'm going to close this issue as it seems like the immediate question has been answered.
We don't have immediate plans to refactor the dictionary builder API, simply because we don't have the time to dedicate to it. But it is a long term objective.