zstd
zstd copied to clipboard
Add Tools to Scale Compression Resources on Constrained Systems
Is your feature request related to a problem? Please describe. Developers who want to use zstd in its higher compression modes face difficult choices when their application is shipped to and runs on highly disparate devices. On high powered server-class hardware, level 19 (or even higher) may be appropriate, but the same level on a constrained device (e.g., a Raspberry Pi) may take unconscionably long to complete--or even crash!
Describe the solution you'd like We should offer users of zstd tools to scale down the resource consumption of compression based on the constraints of the system. The three relevant constraints are probably time, memory, and threads. The most pressing is probably memory consumption, since it can lead to crashes.
There are a number of ways to indicate these constraints and to reconcile them with the overall compression intent. My expectation is that the most straightforward is for users to select the compression level etc. that they'd like to use in the absence of constraints and then supply any constraints. It would then be zstd's job to reconcile them if they conflict. E.g.:
ZSTD_compressionParameters ZSTD_selectCParamsMatchingConstraints(int cLevel, size_t inputSize, size_t maxMem) {
ZSTD_compressionParameters cParams;
do {
/* cctx doesn't get any smaller after -1 */
cParams = ZSTD_getCParams(cLevel, inputSize, 0);
size_t size = ZSTD_estimateCStreamSize_usingCParams(cParams);
if (size <= maxMem) {
break;
}
} while (--cLevel >= -1);
return cParams;
}
A follow-up topic would be whether zstd could determine these constraints itself (like -T0 does for threads).
Describe alternatives you've considered Users have the tools to make these choices themselves. However, it would put significant burden on them to have to understand how compression parameters map to memory usage etc.
Additional context This came up on the ubuntu mailing list in the context of rebuilding the initramfs.
In short, I am proposing that to start we add a CLI flag --max-mem. Ideally, it could take arguments of the form x%, and would limit zstd to using that percentage of total system memory.
Here I am wondering if it's possible for the non-memory case to do adaptive compression wherein we specify a target file size / compression ratio and a target time, and it then adjusts settings as it processes blocks to reach a (pareto?) optimum. But that might be too weird.
Some additional data: as a quick follow-up to your note on the ubuntu mailing list (and my apologies it's taken so long to post this!), I just thought I'd add a quick link here to the analysis repo we've thrown some data and results into. It includes the performance of various tools (including zstd) on a variety of architectures and memory sizes, but only looking at initrd compression (with all the assumptions and limitations that implies).