python-zstandard icon indicating copy to clipboard operation
python-zstandard copied to clipboard

Footgun with the `compressobj()` API

Open embg opened this issue 3 years ago • 2 comments
trafficstars

The fact that you can create multiple compressobj from a single ZstdCompressor allows for foot-guns such as:

a = zstandard.ZstdCompressor()

b = a.compressobj()
c = b.compress(b"prefix")

d = a.compressobj()
e = d.compress(b"foo")
e += d.flush(zstandard.FLUSH_BLOCK)

c = b.compress(b"foo")
c += b.flush(zstandard.FLUSH_BLOCK)

assert(c != e) # Assertion fails!

The API should protect users from interleaving usages of two compressobj. This could be accomplished via a counter that is atomically incremented in the ZstdCompressor. The compressobj would know what count it was created on and throw an error if compress() or flush() are called after the counter in the parent ZstdCompressor is incremented.

Thanks @thatch for identifying this issue and proposing the fix.

embg avatar Sep 16 '22 21:09 embg

We might be able to add some checks here. But doing this comprehensively is difficult to impossible. e.g. sometimes an application may want to abort an in-progress (de)compression operation while still preserving the (de)compressor instance for reuse. This is totally valid!

I think the best we can do in the short term is better document that temporally overlapping usage won't work and will lead to runtime errors.

indygreg avatar Oct 29 '22 20:10 indygreg

Thanks so much for adding the docs in https://github.com/indygreg/python-zstandard/commit/61876592e606a6118e4e394498564fbfcbb962be!

I'm a little confused about the example you gave. Why preserve the compressobj instance for re-use rather than preserve the ZstdCompressor and create a new compressobj for each compression? Isn't the compressobj a very thin wrapper?

I agree with you that there isn't a perfect solution :)

embg avatar Oct 29 '22 22:10 embg