parallel-png-proposal Ambiguous Decodes

I have identified a potential (security?) flaw with this approach.

I think it is possible to craft a file, where:

decompress(a + b) != decompress(a) + decompress(b)

This could happen if a ends midway through a non-compressed block.

It is therefore possible for an image to have two possible interpretations, depending on whether a parallel or non-parallel decoder decodes it.

This can be mitigated by the decoder, by checking that there is no unprocessed data in each piece of the zlib stream. My implementation does not currently do this!

I realise this sounds a bit implausible, so I will try to come up with a proof-of-concept.

Dec 15 '21 09:12 DavidBuchanan314

Proof-of-concept:

import zlib

def decompress_headerless(data):
	d = zlib.decompressobj(wbits=-15)
	result = d.decompress(data)
	result += d.flush()

	# do all the checks we can?
	assert(len(d.unconsumed_tail) == 0)
	assert(len(d.unused_data) == 0)

	return result

c = zlib.compressobj(level=9, wbits=-15)

# there is an off-by-6 bug somewhere in this rats nest, and I can't be bothered to figure it out...
TARGET_SIZE = 64 + 6

a_body = b"Hello, world!"
a_body += b" " * (TARGET_SIZE - 6 - len(a_body))

a = b""
a += c.compress(a_body)
a += c.flush(zlib.Z_FULL_FLUSH)

# declare start of a non-compressed block
blen = TARGET_SIZE // 2 + 5
a += b"\x00" # not last block
a += blen.to_bytes(2, "little")
a += (blen^0xFFFF).to_bytes(2, "little")

c2 = zlib.compressobj(level=9, wbits=-15)
b = b""
b += c2.compress(b" SECRET MESSAGE ")
b += c2.flush(zlib.Z_FULL_FLUSH)

endlen = TARGET_SIZE - len(b)# - 5
b += b"\x01" # last block
b += endlen.to_bytes(2, "little")
b += (endlen^0xFFFF).to_bytes(2, "little")
b += b"X" * (endlen - TARGET_SIZE // 2)

b += c2.compress(b" ALTERNATE MSG ")
b += c2.flush(zlib.Z_FULL_FLUSH)

endlen = TARGET_SIZE - len(b)# - 5
b += b"\x01" # last block
b += endlen.to_bytes(2, "little")
b += (endlen^0xFFFF).to_bytes(2, "little")
b += b"Y" * endlen

print("zlib part a:", a.hex())
print("zlib part b:", b.hex())

piece_a = decompress_headerless(a)
piece_b = decompress_headerless(b)

assert(len(piece_a) == 64)
assert(len(piece_b) == 64)

interpretation_1 = piece_a + piece_b
interpretation_2 = decompress_headerless(a + b)

assert(len(interpretation_1) == 128)
assert(len(interpretation_2) == 128)

print(interpretation_1)
print(interpretation_2)

# this is not true!!!
assert(interpretation_1 == interpretation_2)

"""
OUTPUT:

zlib part a: f248cdc9c9d75128cf2fca495154201d00000000ffff002800d7ff
zlib part b: 520876750e720d51f0750d0e7674775500000000ffff013000cfff585858585858585858585858585270f409710df2730c7155f00d765700000000ffff010900f6ff595959595959595959
b'Hello, world!                                                    SECRET MESSAGE XXXXXXXXXXXXXRp\xf4\tq\r\xf2s\x0cqU\xf0\rvW\x00\x00\x00\x00\xff\xff\x01\t\x00\xf6\xffYYYYYYYYY'
b'Hello, world!                                                   R\x08vu\x0er\rQ\xf0u\r\x0evtwU\x00\x00\x00\x00\xff\xff\x010\x00\xcf\xffXXXXXXXXXXXXX ALTERNATE MSG YYYYYYYYY'
Traceback (most recent call last):
  File "./zlib_trick.py", line 68, in <module>
    assert(interpretation_1 == interpretation_2)
AssertionError
"""

Dec 15 '21 19:12 DavidBuchanan314

I did all the checks I could, I'm not quite sure how to make zlib complain about incomplete data when decompressing piece a

Dec 15 '21 19:12 DavidBuchanan314

OK - I get what you're saying. Parallel PNG assumes that each IDAT can be independently decoded, started at the first byte of each IDAT.

This can be mitigated by the decoder, by checking that there is no unprocessed data in each piece of the zlib stream. My implementation does not currently do this!

This is what my decoder can (and will do). So I don't see any security risk. You just need to make sure that each IDAT block's bytes are fully consumed by the decompressor. If not, you stop.

Also, any fuzzed/compliant Deflate decompressor should be able to consume any input (valid or not) without crashing, overwriting memory, etc. I don't see any security risk. Worst case, you can always fall back to trying serial decompression if the parallel decomp fails (that's what I'm considering doing). If that fails, the PNG is invalid.

I think the Parallel PNG proposal should clearly document exactly what each IDAT block should contain. It should be a requirement that each IDAT block end on a block (not bit) boundary, and that there are no extra bytes remaining in the IDAT block.

Dec 15 '21 23:12 richgel999

I realised that apple made this same mistake in their own parallel-decodable PNG implementation: https://www.da.vidbuchanan.co.uk/widgets/pngdiff/

Dec 16 '21 00:12 DavidBuchanan314

And yes, this is a 100% solvable problem - but clearly even Apple didn't get it right, so it's something that needs to be approached with care.

Dec 16 '21 00:12 DavidBuchanan314

Interesting - I wasn't aware of this. It's solid existence proof that it's doable.

Dec 16 '21 05:12 richgel999

I did all the checks I could, I'm not quite sure how to make zlib complain about incomplete data when decompressing piece a

Check the value of d.eof:

$ cat zlib_trick.py
import base64
import zlib

def decompress_headerless(data, info=''):
    d = zlib.decompressobj(wbits=-15)
    result = d.decompress(data)
    result += d.flush()

    assert(len(d.unconsumed_tail) == 0)
    assert(len(d.unused_data) == 0)

    print(f'{info} properly formed: {d.eof}')
    return result

a = base64.b16decode('f248cdc9c9d75128cf2fca495154201d00000000ffff002800d7ff'.upper())
b = base64.b16decode('520876750e720d51f0750d0e7674775500000000ffff013000cfff585858585858585858585858585270f409710df2730c7155f00d765700000000ffff010900f6ff595959595959595959'.upper())

print(f'zlib part a: {a.hex()}')
print(f'zlib part b: {b.hex()}')

piece_a = decompress_headerless(a, 'Separately, piece a')
piece_b = decompress_headerless(b, 'Separately, piece b')

interpretation_1 = piece_a + piece_b
interpretation_2 = decompress_headerless(a + b, 'Concatenated a + b')

print(interpretation_1)
print(interpretation_2)
$ python3 zlib_trick.py
zlib part a: f248cdc9c9d75128cf2fca495154201d00000000ffff002800d7ff
zlib part b: 520876750e720d51f0750d0e7674775500000000ffff013000cfff585858585858585858585858585270f409710df2730c7155f00d765700000000ffff010900f6ff595959595959595959
Separately, piece a properly formed: False
Separately, piece b properly formed: True
Concatenated a + b properly formed: True
b'Hello, world!                                                    SECRET MESSAGE XXXXXXXXXXXXXRp\xf4\tq\r\xf2s\x0cqU\xf0\rvW\x00\x00\x00\x00\xff\xff\x01\t\x00\xf6\xffYYYYYYYYY'
b'Hello, world!                                                   R\x08vu\x0er\rQ\xf0u\r\x0evtwU\x00\x00\x00\x00\xff\xff\x010\x00\xcf\xffXXXXXXXXXXXXX ALTERNATE MSG YYYYYYYYY'

Dec 17 '21 06:12 elyscape

Aha, thanks, that's what I was looking for!

Dec 17 '21 10:12 DavidBuchanan314

I think the big problem with this will be errors specific to parallel decoding that don't count as errors when decoding linearly, therefore if you fail at parallel decoding at any point and don't fall back to linear decoding it's technically non-conformant, fun.

It's not like APNG where you fall back to the default image, you have to fall back to the standard decoding method and potentially re-decode some rows.

Dec 17 '21 11:12 randy408

Hi! A newb about PNG file here. What does “ a ends midway through a non-compressed block.” mean?

Dec 18 '21 04:12 seungjunProgramming

See https://datatracker.ietf.org/doc/html/rfc1951

Dec 18 '21 15:12 DavidBuchanan314

I did all the checks I could, I'm not quite sure how to make zlib complain about incomplete data when decompressing piece a

Check the value of d.eof:

I would like to join the conversation, and if there is any mistake in my view, please point it out. In short, my point is that d.eof is not helpful to check incomplete data. d.eof tells whether the last complete block is a final block, and it is not related with the block under processed, even if it only contains header. There is an example for it:

~/Desktop/ cat test2.py    
import base64
import zlib

def compress_headerless(data, final, info=''):
    e = zlib.compressobj(level=9, wbits=-15)
    result = e.compress(data)
    result += e.flush(zlib.Z_FINISH if final else zlib.Z_FULL_FLUSH)

    print(f'{info} compressed as: {result.hex()}')
    return result

def decompress_headerless(data, info=''):
    d = zlib.decompressobj(wbits=-15)
    result = d.decompress(data)
    result += d.flush()

    print(f'{info} properly formed: {d.eof}')
    assert(len(d.unconsumed_tail) == 0)
    assert(len(d.unused_data) == 0)

    return result

a_body = b"Hello, world!"

a = compress_headerless(a_body, False, 'NotFinal block')
decompress_headerless(a, 'NotFinal block')
a += base64.b16decode('00'.upper())
decompress_headerless(a, 'NotFinal block with extra')

a2 = compress_headerless(a_body, True, 'Final block')
decompress_headerless(a2, 'Final block')
a2 += base64.b16decode('00'.upper())
decompress_headerless(a2, 'Final block with extra')

~/Desktop/ python3 test2.py
NotFinal block compressed as: f248cdc9c9d75128cf2fca495104000000ffff
NotFinal block properly formed: False
NotFinal block with extra properly formed: False
Final block compressed as: f348cdc9c9d75128cf2fca49510400
Final block properly formed: True
Final block with extra properly formed: True
Traceback (most recent call last):
  File "/Users/smallay/Desktop/test2.py", line 33, in <module>
    decompress_headerless(a2, 'Final block with extra')
  File "/Users/smallay/Desktop/test2.py", line 19, in decompress_headerless
    assert(len(d.unused_data) == 0)
AssertionError

So I think it is a bit more difficult to check incomplete data than it looks like, because the IDAT chunks of PNG always contain non-final blocks except for the last one.

Dec 22 '21 07:12 S3Studio

the IDAT chunks of PNG always contain non-final blocks except for the last one.

In practice it doesn't even have to end, decoders just stop when they got enough data and skip the rest.

Dec 22 '21 12:12 randy408

parallel-png-proposal parallel-png-proposal copied to clipboard

Ambiguous Decodes

parallel-png-proposal
parallel-png-proposal copied to clipboard