zstd icon indicating copy to clipboard operation
zstd copied to clipboard

Extra bits on block are not reported as error.

Open klauspost opened this issue 3 years ago • 2 comments

Describe the bug

Extra bits on stream does not get reported, even after #1598

To Reproduce

Decompress: fea5d210d01530bfeb0130452ef432734b23e744.zst.gz Execute zstd -d fea5d210d01530bfeb0130452ef432734b23e744.zst

Expected behavior

The sample contains 15 extra bits when the sequence has been decoded. These should be reported as an error.

zstd with debugging reports:

zstd-git\lib\decompress\zstd_decompress.c: ZSTD_decompressContinue (srcSize:17)
zstd-git\lib\decompress\zstd_decompress.c: ZSTD_decompressContinue: case ZSTDds_decompressBlock
zstd-git\lib\decompress\zstd_decompress.c: ZSTD_decompressContinue: case bt_compressed
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_decompressBlock_internal (size : 17)
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_decodeLiteralsBlock
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_decodeLiteralsBlock : cSize=6, nbLiterals=5
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_decodeSeqHeaders
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_decompressSequences
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_decompressSequences_body
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_initFseState : val=44 using 6 bits
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_initFseState : val=0 using 5 bits
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_initFseState : val=18 using 6 bits
zstd-git\lib\decompress\zstd_decompress_block.c: seq: litL=1, matchL=62, offset=1
zstd-git\lib\decompress\zstd_decompress_block.c: regenerated sequence size : 63
zstd-git\lib\decompress\zstd_decompress_block.c: seq: litL=2, matchL=62, offset=1
zstd-git\lib\decompress\zstd_decompress_block.c: regenerated sequence size : 64
zstd-git\lib\decompress\zstd_decompress_block.c: seq: litL=2, matchL=33, offset=1
zstd-git\lib\decompress\zstd_decompress_block.c: regenerated sequence size : 35
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_decompressSequences_body: after decode loop, remaining nbSeq : 0
zstd-git\lib\decompress\zstd_decompress.c: ZSTD_decompressContinue: decoded size from block : 162
zstd-git\lib\decompress\zstd_decompress.c: ZSTD_decompressContinue: decoded size from frame : 162
zstd-git\lib\decompress\zstd_decompress.c: ZSTD_decompressContinue: decoded size from frame : 162
zstd-git\lib\decompress\zstd_decompress.c: stage zdss_read
zstd-git\lib\decompress\zstd_decompress.c: neededInSize = 0
fea5d210d01530bfeb0130452ef432734b23e744.zst: 162 bytes

Also tested with v1.5.2 release.

It seems we "agree" on decompression:

2022/03/09 12:13:06 Literals: 5 hash: 51174239790196767 and 3 sequences.
2022/03/09 12:13:06 Seq 0 Litlen: 1 mo: 1 (abs) ml: 62 . bits remain: 49
2022/03/09 12:13:06 Seq 1 Litlen: 2 mo: 1 (abs) ml: 62 . bits remain: 31
2022/03/09 12:13:06 Seq 2 Litlen: 2 mo: 1 (abs) ml: 33 . bits remain: 15
2022/03/09 12:13:06 error after: 15 extra bits on block, should be 0

Desktop (please complete the following information):

  • OS: Windown 10, 64 bits.
  • Compiler Visual Studio

Edit: Second example, 0 sequences, but 4 bytes left on stream after literal decoding: 447902c8e4faf807f9b8f9c1861abe7990c476dd.zst.gz

klauspost avatar Mar 09 '22 11:03 klauspost

Thanks for the bug report!

We'll aim to fix this one, but we're okay with not reporting all types of corruption. Our policy is that if you want error detection, enable checksumming. That said, we still like to report corruption whenever we can.

The differential fuzzing you're doing is super interesting! Glad to see that there is another robust enough implementation out there that differential fuzzing can reveal bugs in the reference decoder!

terrelln avatar Mar 14 '22 16:03 terrelln

One thing I've learnt from the decoder is that it checks everything :)

This should be a good (but not reliable) corruption detection method. Of course not as reliable as CRC, but it can also be seen as a good supplement for the chance of the 32 bit CRC colliding, just like the other checks.

Yeah. It is not a clean-room implementation, but still based on the spec and with peeks at the code for the "not-clear-enough-for-me" parts. Most of the remaining mismatches are because of how the wrapper is implemented.

klauspost avatar Mar 15 '22 12:03 klauspost