pigz icon indicating copy to clipboard operation
pigz copied to clipboard

Any plan to make a pigzlib like zlib?

Open sfchen opened this issue 8 years ago • 11 comments

A zlib like library for pigz is wanted

sfchen avatar Nov 29 '17 00:11 sfchen

I have considered adding multi-threaded compression to zlib. However I'm not sure what sort of interface people would be looking for. What do you imagine the interface would look like?

madler avatar Nov 29 '17 00:11 madler

It will be great if zlib will be multi-threaded. It will be better if the multi-threaded APIs are similar with current zlib.

sfchen avatar Nov 29 '17 00:11 sfchen

Similar how? More importantly, different how? How would the multi-threading be controlled? I would like to have a specific design for the interface with some level of consensus from potential users before implementing it.

madler avatar Nov 29 '17 00:11 madler

I mostly access zlib as is through gzgetc and gzgets, which don’t have terribly intuitive liftovers. More intuitive would be a pointer to read from like a file, but at that point you might as well just use popen with a shell call to pigz.

dnbaker avatar Nov 29 '17 01:11 dnbaker

"liftover"?

madler avatar Nov 29 '17 01:11 madler

I meant conversion or adaptation. (IE, making a pigz-version of gzwrite/gzread/gzgetc/gzgets.) I mostly am imagining that making those successive calls wouldn’t often benefit from parallelization unless you were filling a sufficient large buffer and then dispatching its compression in parallel as needed.

I guess it mostly just depends on how the implementation works. I do imagine I’d want to set the number of threads at file handle creation and leave the arguments to functions the same as their serial counterparts.

dnbaker avatar Nov 29 '17 01:11 dnbaker

Parallel compression needs much more memory than single-thread compression, both for large data buffers and for the multiple compression engines themselves. gzwrite does not need to do anything right away. You could send it small amounts of data and it could accumulate it in a buffer until it has enough to send to a compression engine in a thread. The user would need to say how many threads they want, and how much memory to use, implying an acceptable latency on accumulating data for chunks of compression.

madler avatar Nov 29 '17 02:11 madler

By the way, this would only be for compression. Decompression would be single-thread.

madler avatar Nov 29 '17 02:11 madler

However I'm not sure what sort of interface people would be looking for.

I'd expect at least an API that would be a drop-in replacement for zlib, function by function. So, instead of #include<zlib.h> the programmer would do #include<pigzlib/zlib.h> and it would compile and work out of the box. Or he/she would change the compile settings from -I...include/zlib to -I...include/pigzlib and the same would be true.

But I'm not entirely sure full compatibility is possible, like for low level primitives.

Then on top of that you could add some extra APIs to control/monitor resource usage, but that can be a second feature. This way, adoption of pigzlib would be quite easy and straightforward.

joaoe avatar Dec 10 '17 16:12 joaoe

It might make sense to have the library offer both a drop-in replacement, and more customize-able functions that would let developers specify things such as thread count. It would be pretty awesome to let OpenSSH use multi-threaded compression with a simple addition to the compilation process, for example. I think this is a must.

dschwartz783 avatar Mar 06 '18 12:03 dschwartz783

or think about RSYNC with PIGZ-compression - would be awsome!

stilsch avatar Mar 06 '18 14:03 stilsch