mlz icon indicating copy to clipboard operation
mlz copied to clipboard

mini-LZ library

mini-LZ library (mlz) v0.2h (c) Martin Sedlak 2016-2018

a simple, portable LZ-only codec written in plain C distributed under Boost software license (this means you don't have to include the license with binary distributions)

performance on Silesia corpus (single ~207MB block) on my i7: compression (max level): 4-5MB/sec (thus very slow) decompression: ~500-700MB/sec (heavily depends on compiler optimizer) (reasonably fast but 3x slower than LZ4) compressed size: ~71MB (worse than zlib but still relatively good)

(note: assuming MB = 1024^2 bytes)

interface: checked or unchecked decompression (unchecked means doesn't check for buffer bounds overflow) single-file unchecked decompression (mlz_dec_mini.h) simple interface for streaming codec streams now have 2-byte header by default to store encoding params

simple example streaming commandline tool in mlzc.c (just define MLZ_COMMANDLINE_TOOL)

streaming compression can be multithreaded now (define MLZ_THREADS), but speedup is lousy (~2.5x with 4 cores) streaming decompression of independent blocks can be multithreaded now as well

for basic block codec, the following files will do: mlz_common.h mlz_enc.c, mlz_enc.h for compression mlz_dec.c, mlz_dec.h for decompression

see headers for detailed description

basic block interface:

mlz_compress_simple( destination_buffer, destination_buffer_size, /* =limit / source_buffer, source_buffer_size, compression_level / use MLZ_COMPRESS_MAX or 10 for maximum compression) */ )

returns 0 on failure or size of compressed block

mlz_decompress_simple( destination_buffer, destination_buffer_size, /* = limit */ source_buffer, source_buffer_size )

returns 0 on failure or size of decompressed block

streaming interface: see headers and mlzc.c for detailed description

algorithm: plain lz77 with deep lazy matching 64kb "sliding dictionary", handling extreme cases (long literal runs and extremely well compressed data) matcher is simple hash-list (hash-chain) data format is described in source files (using 24-bit bit accumulator)

newly added experimental level 11 uses naive SLOW nearly-optimal parsing (up to 3 orders of magnitude slower for some files, so max level is kept at 10) gains are typically in the range of 1 to 3%, so not much

new compression mode for command line tool: -rm (raw in-memory compression) useful for embedding compressed data format: LE32 uncompressed_size LE32 adler32 LE32 compressed_size ... compressed_data ...

have fun