zlib-ng
zlib-ng copied to clipboard
minigzip compatibility with gzip
Can some options of gzip be supported?
gzip supports specifying options in -cd besides `-c -d:
-f overwrites ouput instead of "compress with Z_FILTERED"
-q surpresses warnings
-- for ending option processing
...
-c, --stdout write on standard output, keep original files unchanged
-d, --decompress decompress
-f, --force force overwrite of output file and compress links
-h, --help give this help
-l, --list list compressed file contents
-L, --license display software license
-n, --no-name do not save or restore the original name and time stamp
-N, --name save or restore the original name and time stamp
-q, --quiet suppress all warnings
-r, --recursive operate recursively on directories
-S, --suffix=SUF use suffix SUF on compressed files
-t, --test test compressed file integrity
-v, --verbose verbose mode
-V, --version display version number
-1, --fast compress faster
-9, --best compress better
--rsyncable Make rsync-friendly archive
Addition of zcat and gunzip like wrappers would be nice too.
minigzip doesn't try to be replacement for gzip... There was talks few days ago about making full replacement that has all the features of gzip but internally uses zlib-ng, but whether we will do it or not, depends on how many people would actually use it.
At least good to know that it might be supported in the future.
Supporting -- as option to stop processing arguments would still be nice though.
I'm not saying yet when we will add new features to minigzip as our main concern now is to have a stable release with as little bugs as possible... After all the bugs have been found and fixed, we will start adding new features.
I found a workaround in the meantime.
As pigz is linked to zlib, I was able to use it with zlib-ng instead. pigz should support all gzip options. When using pigz with multiple threads and zlib-ng instead of zlib, compression it is still 4 times faster :-).
# gzip.
$ time gzip -c -6 test.txt > test.txt.gz
real 7m12.472s
user 7m8.307s
sys 0m1.821s
# minigzip
$ time minigzip -c -6 test.txt > test.txt.gz
real 1m46.187s
user 1m43.936s
sys 0m1.301s
# pigz with 8 threads and zlib.
$ time pigz -p 8 -6 -c test.txt > test.txt.gz
real 0m59.679s
user 7m57.845s
sys 0m3.842s
# With zlib-ng
# pigz with 2 threads and zlib-ng
time pigz -p 2 -6 -c test.txt > test.txt.gz
real 1m4.082s
user 2m11.145s
sys 0m4.082s
# pigz with 4 threads and zlib-ng.
$ time pigz -p 4 -6 -c test.txt > test.txt.gz
real 0m32.674s
user 2m12.010s
sys 0m2.810s
# pigz with 6 threads and zlib-ng.
$ time pigz -p 6 -6 -c test.txt > test.txt.gz
real 0m22.818s
user 2m15.296s
sys 0m2.442s
# pigz with 8 threads and zlib-ng.
$ time pigz -p 8 -c test.txt > test.txt.gz
real 0m16.715s
user 2m14.397s
sys 0m2.336s
https://github.com/madler/pigz
We also have pigzbench repository that can be used to build and benchmark pigz against different zlib forks.
minigzip doesn't try to be replacement for gzip... There was talks few days ago about making full replacement that has all the features of gzip but internally uses zlib-ng, but whether we will do it or not, depends on how many people would actually use it.
It may be useful to note that NetBSD's gzip is actually a frontend to zlib (and FreeBSD's gzip is a fork of NetBSD's as well). It doesn't have all the options that GNU gzip has (e.g. --rsyncable), but it has most, and I'd expect at least the bulk of automated uses in Makefiles and build systems etc. to use options that exist in the BSDs. The code itself will probably have some BSD-specific quirks, but I'm sure it can be made portable (with or without the use of libbsd). So perhaps porting a widely deployed/battle-tested permissively licensed gzip implementation to zlib-ng's native API makes more sense than reimplementing gzip from scratch, or even with minigzip as the starting point.
Side-note: back in 2007, I forked NetBSD's code and heavily modified it, to create "zgz", part of Debian's pristine-tar package. Given how long ago this was, I don't remember what the porting effort entailed; it was also a fork of NetBSD's 20060927 revision, so it's dubious whether any lessons learned from back then would be relevant today anyway. Several others (Joey Hess, Josh Triplett etc.) have also modified the code since. The code is probably unrecognizable compared to the original NetBSD version, and it also is quite domain-specific, as its purpose is to have all kinds of "expert" flags, to be used to simulate various archivers, and recreate archives found in the wild. I'm not sure if that's of any use here, but I'm mentioning it for posterity and in case it gives you a bit of an insight of the variety of gzip archives that exist out there.
I think one challenge with existing gzip source code is that it is not zlib licensed. In this repository we only allow zlib licensed source code.
I think one challenge with existing gzip source code is that it is not zlib licensed. In this repository we only allow zlib licensed source code.
GNU gzip is under GPLv3, and including code using a strong copyleft license would be indeed a pretty big departure from the permissiveness of the zlib license that this project is uses - I concur.
The NetBSD/FreeBSD gzip code that I mentioned above on the other hand, is under the 2-clause BSD license. It's not the same as the zlib license, but it's not very far either.
It would be best if the *BSD version would be dual-licensed or "official" zlib-ng adaptation would be released... Might be worth discussing with the maintainers...
It would be best if the *BSD version would be dual-licensed or "official" zlib-ng adaptation would be released... Might be worth discussing with the maintainers...
(A year and a half later, reviving this)
I don't have any objections per se to this (especially given I'm not the one to be doing the work ;), but I am also curious what problem would it solve. The 2-clause BSD license is a fine license, about as permissive as the zlib license is, OSI-approved, FSF-approved, and popular enough to have been vetted already by corporate legal departments. Is there a perceived scenario where one would be OK with the zlib license but not the 2-clause BSD for their project?
Don't get me wrong, I appreciate homogeneity and consistency! But this would come at a significant expense of relicensing, dragging into the conversation potentially dozens of contributors, so I believe it's worthwhile to be asking about the benefits...
For what it's worth, I tried compiling FreeBSD main's (https://github.com/freebsd/freebsd-src/commit/0f8b2ba6c629237e4ddd7a72f7c22f687208060d) gzip (usr.bin/gzip) under a Debian unstable, with zlib-ng.
I had to:
- Add this at the top (all BSD-isms):
#define nitems(x) (sizeof((x)) / sizeof((x)[0]))
#define __unused __attribute__((__unused__))
#define SIGINFO SIGUSR1
#define EFTYPE EINVAL
#define __COPYRIGHT(_s) static const char copyright[] __unused = _s
-
Comment-out a single-line that checks
sb.st_flagsand callsfchflags. -
Build with:
gcc -Wall -isystem /usr/include/bsd -DLIBBSD_OVERLAY -DNO_BZIP2_SUPPORT -DNO_XZ_SUPPORT -DNO_LZ_SUPPORT -DNO_ZSTD_SUPPORT -o gzip gzip.c -lz -lbsd
After that, the resulting binary, linked against zlib-ng, just works in my (limited) testing.
To productionize this, one could: a) Use unifdef to strip the source code from bzip2/xz/LZMA/zstd. b) Add the necessary configure options and/or ifdefs so that the two modifications above are conditional to Linux/glibc/etc. c) Integrate into the build system.
The libbsd dependency can be further reduced if one where to inline a handful of functions (getprogname, le32dec, strlcpy etc.), but I would not recommend it.