jpeg-decoder
jpeg-decoder copied to clipboard
"failed to decode huffman code" on image that works fine in other software
The following image fails to decode with a "failed to decode huffman code" error, but it displays "fine" (with some corruption in the bottom part of the image) in Firefox, sxiv, VS Code, and Dolphin:
This is a case of a corrupt entropy stream, but it's not quite clear what the corruption is. I can, however, illustrate why the stream yields the error. We'll start from the file offset 0xD19 (3353'th byte). Here begins the coefficient decode for the MCU block (x=4, y=14), or in coordinates of the image, (32, 112), specifically component {id=2}.
Please excuse the crude diagram:
|---69--|---C1--|---A9--|---5B--|---B7--|---1F--|---BE--|---8F--|---FE--|---FB--
01101001110000011010100101011011100001110001111110111110100011111111111011111011
H3: ---100 H6: ------011100 ????????????????????
H6: ------001101 H1: --1
H1: --0 H34: --------11
H1: --0 H1: --0
H17: ----0 H0: --
We start off with reading the DC coefficient; H3 means the Huffman symbol for '3' (110b in this table), followed by 3 bits (100b), to be calculated as some coefficient offset value. Irrelevant here. Then it reads the AC coefficients, until it reaches the Huffman '0' symbol, signifying that it's the end of the coefficient information. All good'ish (although I think this row is a bit suspect in general)!
Now comes the next set of coefficients we need to read, for component {id=3}. We have the following set of bits ahead in the stream: 11111111111011111011 -- eleven one-bits ahead.
The problem is that the DC Huffman table in use has no matching code for this pattern of eleven one-bits. It's got the following entries:
Value | Code | Value | Code |
---|---|---|---|
0 | 00 | 6 | 111110 |
1 | 01 | 7 | 1111110 |
2 | 10 | 8 | 11111110 |
3 | 110 | 9 | 111111110 |
4 | 1110 | 10 | 1111111110 |
5 | 11110 | 11 | 11111111110 |
At this point, I'm not sure what the correct approach to even trying to recover from the issue would be. Somewhere the stream got garbled by some bits, but there's (to my brief look) no obvious place. I suppose the bare minimum one could theoretically do to recover from this type of error, would be to scan forward for the next RST marker, and drop all of the coefficients until then (super ugly), but the problem here is that this stream doesn't have any.
I have a couple more example jpegs that are handled by other software by fail to load with jpeg-decoder
because of the "failed to decode huffman code" error.
Admittedly, my sample files look even more like garbage than @SludgePhD's file. But it would be nice if image-rs/jpeg-decoder
could make a best effort. My use case is to generate image fuzzy hashes for malware identification. So my hope is to make a best effort even if the image is technically malformed.
@micahsnyder If you want to match what other viewers display as closely as possible, I'd suggest relying on libjpeg
since that's what those viewers are probably using. Any other decoder implementation is almost surely going to display some corrupted images differently since they don't usually target "bug for bug compatibility"
@fintelia that's good advice. Thanks. In my case it's okay if it's not rendered the same, so long as it gives a usable image that can be hashed. Our tools for both creating the hashes and matching the hashes use the same code with image-rs
/ jpeg-decoder
. So I think it is okay for us if the best effort comes out a little differently.
So this basically relates to #252, except "partially-invalid" would be more accurate than "incomplete", and you wouldn't want the decoder to give up as early as possible if you want this to be generally useful.
This patch puts the decoder on par with libjpeg-turbo and FFmpeg (with this patch applied) in my very limited tests.
The first part of the patch (the read_marker
part) was done to handle the image from @micahsnyder. This may not be how/where EOF errors should/would be handled.
The last part (continuing after decode block errors) should be generally useful. It fixed an image that brought me here, and it's enough to handle the image from @SludgePhD.