kanzi-cpp
kanzi-cpp copied to clipboard
Fast lossless data compression in C++
kanzi
Kanzi is a modern, modular, portable and efficient lossless data compressor implemented in C++.
- modern: state-of-the-art algorithms are implemented and multi-core CPUs can take advantage of the built-in multi-threading.
- modular: entropy codec and a combination of transforms can be provided at runtime to best match the kind of data to compress.
- portable: many OSes, compilers and C++ versions are supported (see below).
- expandable: clean design with heavy use of interfaces as contracts makes integrating and expanding the code easy. No dependencies.
- efficient: the code is optimized for efficiency (trade-off between compression ratio and speed).
Unlike the most common lossless data compressors, Kanzi uses a variety of different compression algorithms and supports a wider range of compression ratios as a result. Kanzi is also multithreadead by design and generates a seekable bit stream. It is not compatible with standard compression formats. Kanzi is a lossless data compressor, not an archiver. It uses checksums (optional but recommended) to validate data integrity but does not have a mechanism for data recovery. It also lacks data deduplication across files.
For more details, check https://github.com/flanglet/kanzi/wiki.
There is a Java implementation available here: https://github.com/flanglet/kanzi
There is Go implementation available here: https://github.com/flanglet/kanzi-go
For more details, check https://github.com/flanglet/kanzi-cpp/wiki.
Credits
Matt Mahoney, Yann Collet, Jan Ondrus, Yuta Mori, Ilya Muravyov, Neal Burns, Fabian Giesen, Jarek Duda, Ilya Grebnov
Disclaimer
Use at your own risk. Always keep a backup of your files.
Silesia corpus benchmark
i7-7700K @4.20GHz, 32GB RAM, Ubuntu 22.04
clang++ 14.0.0, tcmalloc
Kanzi version 2.1 C++ implementation. Block size is 100 MB.
Compressor | Encoding (sec) | Decoding (sec) | Size |
---|---|---|---|
Original | 211,938,580 | ||
Kanzi -l 1 -j 1 | 1.1 | 0.5 | 69,399,477 |
Kanzi -l 1 -j 6 | 0.4 | 0.2 | 69,399,477 |
Pigz 2.6 -5 -p6 | 1.0 | 0.7 | 69,170,603 |
Gzip 1.10 -5 | 4.8 | 1.0 | 69,143,980 |
Zstd 1.5.3 -2 --long=30 | 0.9 | 0.5 | 68,694,316 |
Zstd 1.5.3 -2 -T6 --long=30 | 0.4 | 0.3 | 68,694,316 |
Brotli 1.0.9 -2 --large_window=30 | 1.5 | 0.8 | 68,033,377 |
Pigz 2.6 -9 -p6 | 3.0 | 0.6 | 67,656,836 |
Gzip 1.10 -9 | 15.5 | 1.0 | 67,631,990 |
Kanzi -l 2 -j 1 | 2.3 | 0.7 | 63,808,747 |
Kanzi -l 2 -j 6 | 0.9 | 0.3 | 63,808,747 |
Brotli 1.0.9 -4 --large_window=30 | 4.1 | 0.7 | 64,267,169 |
Kanzi -l 3 -j 1 | 3.5 | 1.3 | 59,199,795 |
Kanzi -l 3 -j 6 | 1.2 | 0.4 | 59,199,795 |
Zstd 1.5.3 -9 --long=30 | 3.7 | 0.3 | 59,272,590 |
Zstd 1.5.3 -9 -T6 --long=30 | 2.3 | 0.3 | 59,272,590 |
Orz 1.5.0 | 7.7 | 2.0 | 57,564,831 |
Brotli 1.0.9 -9 --large_window=30 | 36.7 | 0.7 | 56,232,817 |
Lzma 5.2.2 -3 | 24.1 | 2.6 | 55,743,540 |
Kanzi -l 4 -j 1 | 6.2 | 4.2 | 54,998,198 |
Kanzi -l 4 -j 6 | 2.0 | 1.2 | 54,998,198 |
Bzip2 1.0.6 -9 | 14.9 | 5.2 | 54,506,769 |
Zstd 1.5.3 -19 --long=30 | 62.0 | 0.3 | 52,828,057 |
Zstd 1.5.3 -19 -T6 --long=30 | 62.0 | 0.4 | 52,828,057 |
Kanzi -l 5 -j 1 | 11.3 | 4.5 | 51,760,244 |
Kanzi -l 5 -j 6 | 3.6 | 1.5 | 51,760,244 |
Brotli 1.0.9 --large_window=30 | 356.2 | 0.9 | 49,383,136 |
Lzma 5.2.2 -9 | 65.6 | 2.5 | 48,780,457 |
Kanzi -l 6 -j 1 | 13.6 | 6.2 | 48,068,000 |
Kanzi -l 6 -j 6 | 4.2 | 2.1 | 48,068,000 |
bsc 3.2.3 -b100 -T -t | 8.8 | 6.0 | 46,932,394 |
bsc 3.2.3 -b100 | 5.4 | 4.9 | 46,932,394 |
BCM 1.65 -b100 | 15.5 | 21.1 | 46,506,716 |
Kanzi -l 7 -j 1 | 16.7 | 11.1 | 46,447,003 |
Kanzi -l 7 -j 6 | 5.2 | 3.7 | 46,447,003 |
Tangelo 2.4 | 83.2 | 85.9 | 44,862,127 |
zpaq v7.14 m4 t1 | 107.3 | 112.2 | 42,628,166 |
zpaq v7.14 m4 t12 | 108.1 | 111.5 | 42,628,166 |
Kanzi -l 8 -j 1 | 47.8 | 49.4 | 41,821,127 |
Kanzi -l 8 -j 6 | 15.8 | 15.5 | 41,821,127 |
Tangelo 2.0 | 302.0 | 310.9 | 41,267,068 |
Kanzi -l 9 -j 1 | 72.4 | 74.5 | 40,361,391 |
Kanzi -l 9 -j 6 | 26.1 | 26.9 | 40,361,391 |
zpaq v7.14 m5 t1 | 343.1 | 352.0 | 39,112,924 |
zpaq v7.14 m5 t12 | 344.3 | 350.4 | 39,112,924 |
enwik8
i7-7700K @4.20GHz, 32GB RAM, Ubuntu 22.04
clang++ 14.0.0, tcmalloc
Kanzi version 2.1 C++ implementation. Block size is 100 MB. 1 thread
Compressor | Encoding (sec) | Decoding (sec) | Size |
---|---|---|---|
Original | 100,000,000 | ||
Kanzi -l 1 -j 1 | 0.78 | 0.33 | 37,969,539 |
Kanzi -l 2 -j 1 | 1.65 | 0.56 | 30,953,719 |
Kanzi -l 3 -j 1 | 2.02 | 0.80 | 27,362,969 |
Kanzi -l 4 -j 1 | 3.37 | 2.18 | 25,670,924 |
Kanzi -l 5 -j 1 | 5.14 | 1.82 | 22,490,875 |
Kanzi -l 6 -j 1 | 6.88 | 2.80 | 21,232,300 |
Kanzi -l 7 -j 1 | 8.80 | 5.02 | 20,935,519 |
Kanzi -l 8 -j 1 | 18.84 | 18.95 | 19,671,786 |
Kanzi -l 9 -j 1 | 28.25 | 29.03 | 19,097,946 |
Build Kanzi
The C++ code can be built on Windows with Visual Studio, Linux, macOS and Android with g++ and/or clang++. There are no dependencies. Porting to other operating systems should be straightforward.
Visual Studio 2008
Unzip the file "Kanzi_VS2008.zip" in place. The project generates a Windows 32 binary. Multithreading is not supported with this version.
Visual Studio 2017
Unzip the file "Kanzi_VS2017.zip" in place. The project generates a Windows 64 binary. Multithreading is supported with this version.
mingw-w64
Go to the source directory and run 'make clean && mingw32-make.exe'. The Makefile contains all the necessary targets. Tested successfully on Win64 with mingw-w64 g++ 8.1.0. Multithreading is supported. Compiled successfully with C++11, C++14, C++17.
Linux
Go to the source directory and run 'make clean && make'. The Makefile contains all the necessary targets. Build successfully on Ubuntu with g++ 8.4.0, g++ 9.3.0, g++ 10.3.0, clang++ 10.0.0 and icc 19.0.0.117. Multithreading is supported with g++ version 5.0.0 or newer. Compiled successfully with C++11, C++14, C++17, C++20.
BSD
The makefile uses the gnu-make syntax. First, make sure gmake is present (or install it: 'pkg_add gmake'). Go to the source directory and run 'gmake clean && gmake'. The Makefile contains all the necessary targets.