lzbench icon indicating copy to clipboard operation
lzbench copied to clipboard

Fails on incompressible data

Open nskyav opened this issue 7 months ago • 2 comments

Some compressors (brieflz, density, fastlz, tornado, ucl_nrv2b, ucl_nrv2d, ucl_nrv2e, zlib, zlib-ng) fail on incompressible data (like dd if=/dev/random of=somefile bs=1024 count=1024) after 7b98759b. Partial revert of 7b98759b like below helps:

diff --git a/bench/lzbench.cpp b/bench/lzbench.cpp
index 74a4329..9006ed5 100644
--- a/bench/lzbench.cpp
+++ b/bench/lzbench.cpp
@@ -288,11 +288,15 @@ inline int64_t lzbench_compress(lzbench_params_t *params, std::vector<size_t>& c
.
         clen = compress((char*)inbuf, part, (char*)outbuf, outpart, codec_options);
.
-        if (clen <= 0)
+        if (clen <= 0 || clen == part)
         {
-            LZBENCH_PRINT(9, "ERROR: part=%lu clen=%ld in=%lu out=%lu\n", (uint64_t)part, clen, (uint64_t)(inbuf-start), (uint64_t)sum);
-            LZBENCH_PRINT(0, "ERROR: compression error in=%zu out=%ld/%zu (in_bytes=%lu out_bytes=%ld)\n", part, clen, outpart, (uint64_t)(inbuf+part-start), (int64_t)sum+clen);
-            return 0;
+            if (part > outsize) {
+                LZBENCH_PRINT(9, "ERROR: part=%lu clen=%ld in=%lu out=%lu\n", (uint64_t)part, clen, (uint64_t)(inbuf-start), (uint64_t)sum);
+                LZBENCH_PRINT(0, "ERROR: compression error in=%zu out=%ld/%zu (in_bytes=%lu out_bytes=%ld)\n", part, clen, outpart, (uint64_t)(inbuf+part-start), (int64_t)sum+clen);
+                return 0;
+            }
+            memcpy(outbuf, inbuf, part);
+            clen = part;
         }
.
         inbuf += part;
@@ -324,7 +328,13 @@ inline int64_t lzbench_decompress(lzbench_params_t *params, std::vector<size_t>&
         }
#endif
.
-        dlen = decompress((char*)inbuf, part, (char*)outbuf, chunk_sizes[i], codec_options);
+        if (part == chunk_sizes[i]) // uncompressed
+        {
+            memcpy(outbuf, inbuf, part);
+            dlen = part;
+        } else {
+            dlen = decompress((char*)inbuf, part, (char*)outbuf, chunk_sizes[i], codec_options);
+        }
.
         if (dlen <= 0) {
             LZBENCH_PRINT(9, "DEC part=%lu dlen=%ld out=%lu\n", (uint64_t)part, dlen, (uint64_t)(outbuf - outstart));

There are similar fails with crush, lzo2a, bzip3, tamp, yalz77, but not sure if it's related to 7b98759b and incompressible data.

nskyav avatar May 13 '25 09:05 nskyav

I decided to remove code that fixes issues with incompressible data for many compressors because it showed memcpy speed instead of real speed or an error. Incompressible data should be addressed by compressor itself not by lzbench. I reported the issue for bzip3 at https://github.com/kspalaiologos/bzip3/issues/156

inikep avatar May 13 '25 13:05 inikep

Lzbench reserves now input_size + input_size/8 + 1024 bytes of memory for output buffer where input_size is a file size or a block size: https://github.com/inikep/lzbench/blob/master/bench/lzbench.h#L26 In case of some simple LZ77 compressors like pithy or yalz77 it may not be enough and lzbench may report errors as well.

inikep avatar May 13 '25 13:05 inikep