pinch
pinch copied to clipboard
Tiny toolset for compressing and hashing data really fast.
Pinch
Associated blog post: Use Fast Data Algorithms.
A toolkit for rapidly compressing, hashing and otherwise making data smaller
when you can't neccesarily install software (e.g. CI/CD envionments) but you
can run docker containers. The resulting pinch container container takes up
about 15MiB of disk space and contains multiple excellent compression/hashing
utilities from Yann Collet in addition to a few
other tools.
- zstd v1.4.9 for great compression
- lz4 v1.9.3 for very fast compression
- xxHash v0.8.0 for ridculously fast hashing
- BLAKE3 v0.3.7 for ridculously fast cryptographic hashing
- age v0.3.7 for reasonably fast encryption/decryption
Note that licenses of these tools are included in the
/usr/local/share/licenses folder within the container except for blake3 which
is released to the public domain.
Installation
You can build the docker image (or grab it from the hub) with
docker build -t pinch .
At this point you can run the included pinch script.
Or you can have the makefile built the container and install the pinch wrapper
locally:
$ make
$ make PREFIX=~/.local install
Running
Use the included pinch wrapper script to run one of the included tools on
some files. Pass the tool you want as an argument, for example:
# For excellent data compression
$ pinch zstd --help
# For fast data compression
$ pinzh lz4 --help
# For data hashing/validation
$ pinch xxhsum --help
# For crypto data hashing/validation
$ pinch b3sum --help
# For enc/dec files
$ pinch age --help
Compression (zstd and lz4)
pinch bundles a few compression tools for quick access.
zstd is almost always the correct choice for good compression. In my
experience it uses significantly less compute resources (and
compresses/decompresses faster) than similarly "good" algorithms like gzip or
xz. Of particular interest is this container includes a version of zstd
with the --adapt functionality included (see the v1.3.6 release
notes). This means that
you can do streaming uploads and downloads of massive datasets while your
compression algorithm adaptively adjusts to the bandwidth and CPU environment
for maximum performance.
lz4 is almost always the correct choice for fast compression. In my
experience it achieves similar ratios as other "fast" algorithms like snappy
but uses less compute.
Hashing (xxHash)
For data verification, there isn't usually a faster choice than taking the
xxHash of your data
Advanced Examples
Some of the things you can do:
You can upload a large file to S3 as fast as possible
$ cat large_file | pinch zstd -c --adapt | aws s3 cp - s3://<path>
See what kind of awesome compression ratio you'll get at level 10
wc -c large_file
cat large_file | pinch zstd -c -10 | wc -c
Read a man page of one of the tools
$ pinch man zstd
$ pinch man xxh64sum
$ pinch man lz4