kanzi-cpp
                                
                                
                                
                                    kanzi-cpp copied to clipboard
                            
                            
                            
                        Fast lossless data compression in C++
kanzi
Kanzi is a modern, modular, portable and efficient lossless data compressor implemented in C++.
- modern: state-of-the-art algorithms are implemented and multi-core CPUs can take advantage of the built-in multi-threading.
 - modular: entropy codec and a combination of transforms can be provided at runtime to best match the kind of data to compress.
 - portable: many OSes, compilers and C++ versions are supported (see below).
 - expandable: clean design with heavy use of interfaces as contracts makes integrating and expanding the code easy. No dependencies.
 - efficient: the code is optimized for efficiency (trade-off between compression ratio and speed).
 
Unlike the most common lossless data compressors, Kanzi uses a variety of different compression algorithms and supports a wider range of compression ratios as a result. Kanzi is also multithreadead by design and generates a seekable bit stream. It is not compatible with standard compression formats. Kanzi is a lossless data compressor, not an archiver. It uses checksums (optional but recommended) to validate data integrity but does not have a mechanism for data recovery. It also lacks data deduplication across files.
For more details, check https://github.com/flanglet/kanzi/wiki.
There is a Java implementation available here: https://github.com/flanglet/kanzi
There is Go implementation available here: https://github.com/flanglet/kanzi-go
For more details, check https://github.com/flanglet/kanzi-cpp/wiki.
Credits
Matt Mahoney, Yann Collet, Jan Ondrus, Yuta Mori, Ilya Muravyov, Neal Burns, Fabian Giesen, Jarek Duda, Ilya Grebnov
Disclaimer
Use at your own risk. Always keep a backup of your files.
Silesia corpus benchmark
i7-7700K @4.20GHz, 32GB RAM, Ubuntu 22.04
clang++ 14.0.0, tcmalloc
Kanzi version 2.1 C++ implementation. Block size is 100 MB.
| Compressor | Encoding (sec) | Decoding (sec) | Size | 
|---|---|---|---|
| Original | 211,938,580 | ||
| Kanzi -l 1 -j 1 | 1.1 | 0.5 | 69,399,477 | 
| Kanzi -l 1 -j 6 | 0.4 | 0.2 | 69,399,477 | 
| Pigz 2.6 -5 -p6 | 1.0 | 0.7 | 69,170,603 | 
| Gzip 1.10 -5 | 4.8 | 1.0 | 69,143,980 | 
| Zstd 1.5.3 -2 --long=30 | 0.9 | 0.5 | 68,694,316 | 
| Zstd 1.5.3 -2 -T6 --long=30 | 0.4 | 0.3 | 68,694,316 | 
| Brotli 1.0.9 -2 --large_window=30 | 1.5 | 0.8 | 68,033,377 | 
| Pigz 2.6 -9 -p6 | 3.0 | 0.6 | 67,656,836 | 
| Gzip 1.10 -9 | 15.5 | 1.0 | 67,631,990 | 
| Kanzi -l 2 -j 1 | 2.3 | 0.7 | 63,808,747 | 
| Kanzi -l 2 -j 6 | 0.9 | 0.3 | 63,808,747 | 
| Brotli 1.0.9 -4 --large_window=30 | 4.1 | 0.7 | 64,267,169 | 
| Kanzi -l 3 -j 1 | 3.5 | 1.3 | 59,199,795 | 
| Kanzi -l 3 -j 6 | 1.2 | 0.4 | 59,199,795 | 
| Zstd 1.5.3 -9 --long=30 | 3.7 | 0.3 | 59,272,590 | 
| Zstd 1.5.3 -9 -T6 --long=30 | 2.3 | 0.3 | 59,272,590 | 
| Orz 1.5.0 | 7.7 | 2.0 | 57,564,831 | 
| Brotli 1.0.9 -9 --large_window=30 | 36.7 | 0.7 | 56,232,817 | 
| Lzma 5.2.2 -3 | 24.1 | 2.6 | 55,743,540 | 
| Kanzi -l 4 -j 1 | 6.2 | 4.2 | 54,998,198 | 
| Kanzi -l 4 -j 6 | 2.0 | 1.2 | 54,998,198 | 
| Bzip2 1.0.6 -9 | 14.9 | 5.2 | 54,506,769 | 
| Zstd 1.5.3 -19 --long=30 | 62.0 | 0.3 | 52,828,057 | 
| Zstd 1.5.3 -19 -T6 --long=30 | 62.0 | 0.4 | 52,828,057 | 
| Kanzi -l 5 -j 1 | 11.3 | 4.5 | 51,760,244 | 
| Kanzi -l 5 -j 6 | 3.6 | 1.5 | 51,760,244 | 
| Brotli 1.0.9 --large_window=30 | 356.2 | 0.9 | 49,383,136 | 
| Lzma 5.2.2 -9 | 65.6 | 2.5 | 48,780,457 | 
| Kanzi -l 6 -j 1 | 13.6 | 6.2 | 48,068,000 | 
| Kanzi -l 6 -j 6 | 4.2 | 2.1 | 48,068,000 | 
| bsc 3.2.3 -b100 -T -t | 8.8 | 6.0 | 46,932,394 | 
| bsc 3.2.3 -b100 | 5.4 | 4.9 | 46,932,394 | 
| BCM 1.65 -b100 | 15.5 | 21.1 | 46,506,716 | 
| Kanzi -l 7 -j 1 | 16.7 | 11.1 | 46,447,003 | 
| Kanzi -l 7 -j 6 | 5.2 | 3.7 | 46,447,003 | 
| Tangelo 2.4 | 83.2 | 85.9 | 44,862,127 | 
| zpaq v7.14 m4 t1 | 107.3 | 112.2 | 42,628,166 | 
| zpaq v7.14 m4 t12 | 108.1 | 111.5 | 42,628,166 | 
| Kanzi -l 8 -j 1 | 47.8 | 49.4 | 41,821,127 | 
| Kanzi -l 8 -j 6 | 15.8 | 15.5 | 41,821,127 | 
| Tangelo 2.0 | 302.0 | 310.9 | 41,267,068 | 
| Kanzi -l 9 -j 1 | 72.4 | 74.5 | 40,361,391 | 
| Kanzi -l 9 -j 6 | 26.1 | 26.9 | 40,361,391 | 
| zpaq v7.14 m5 t1 | 343.1 | 352.0 | 39,112,924 | 
| zpaq v7.14 m5 t12 | 344.3 | 350.4 | 39,112,924 | 
enwik8
i7-7700K @4.20GHz, 32GB RAM, Ubuntu 22.04
clang++ 14.0.0, tcmalloc
Kanzi version 2.1 C++ implementation. Block size is 100 MB. 1 thread
| Compressor | Encoding (sec) | Decoding (sec) | Size | 
|---|---|---|---|
| Original | 100,000,000 | ||
| Kanzi -l 1 -j 1 | 0.78 | 0.33 | 37,969,539 | 
| Kanzi -l 2 -j 1 | 1.65 | 0.56 | 30,953,719 | 
| Kanzi -l 3 -j 1 | 2.02 | 0.80 | 27,362,969 | 
| Kanzi -l 4 -j 1 | 3.37 | 2.18 | 25,670,924 | 
| Kanzi -l 5 -j 1 | 5.14 | 1.82 | 22,490,875 | 
| Kanzi -l 6 -j 1 | 6.88 | 2.80 | 21,232,300 | 
| Kanzi -l 7 -j 1 | 8.80 | 5.02 | 20,935,519 | 
| Kanzi -l 8 -j 1 | 18.84 | 18.95 | 19,671,786 | 
| Kanzi -l 9 -j 1 | 28.25 | 29.03 | 19,097,946 | 
Build Kanzi
The C++ code can be built on Windows with Visual Studio, Linux, macOS and Android with g++ and/or clang++. There are no dependencies. Porting to other operating systems should be straightforward.
Visual Studio 2008
Unzip the file "Kanzi_VS2008.zip" in place. The project generates a Windows 32 binary. Multithreading is not supported with this version.
Visual Studio 2017
Unzip the file "Kanzi_VS2017.zip" in place. The project generates a Windows 64 binary. Multithreading is supported with this version.
mingw-w64
Go to the source directory and run 'make clean && mingw32-make.exe'. The Makefile contains all the necessary targets. Tested successfully on Win64 with mingw-w64 g++ 8.1.0. Multithreading is supported. Compiled successfully with C++11, C++14, C++17.
Linux
Go to the source directory and run 'make clean && make'. The Makefile contains all the necessary targets. Build successfully on Ubuntu with g++ 8.4.0, g++ 9.3.0, g++ 10.3.0, clang++ 10.0.0 and icc 19.0.0.117. Multithreading is supported with g++ version 5.0.0 or newer. Compiled successfully with C++11, C++14, C++17, C++20.
BSD
The makefile uses the gnu-make syntax. First, make sure gmake is present (or install it: 'pkg_add gmake'). Go to the source directory and run 'gmake clean && gmake'. The Makefile contains all the necessary targets.