zstd icon indicating copy to clipboard operation
zstd copied to clipboard

Q: Input changing during compression semantics?

Open dagrh opened this issue 3 years ago • 2 comments

I'm a maintainer on QEMU's migration code; and we recently found out the hard way that ZLib requires the input data to be stable during compression; and I'd like to know what zstd's requirements are. QEMU is a bit odd in that it is compressing the content of guest ram as it's live migrating, but if it does change it'll resend the data again.

So, speciifically for ZSTD_compressStream2 with the default 0 threads; a) Is it legal for the input to change during the call to compressStream2? b) If it's not legal, are the consequences defined i) Something might crash ii) This particular block may contain bogus data on decompression iii) Some other entirely unrelated block might end up with bogus data on decompression.

dagrh avatar Jul 13 '22 15:07 dagrh

Ooh, interesting and tricky question.

a) Is it legal for the input to change during the call to compressStream2?

... no? I don't think it ever even occurred to us to prohibit this, but it's pretty clearly against the spirit of the API. However, that doesn't mean it won't work.

b) If it's not legal, are the consequences defined

i) Something might crash

No: I believe zstd should never crash under these circumstances, although we don't have any tests that validate that...

ii) This particular block may contain bogus data on decompression

Yes: The block in question may contain bogus data. (Although see below.)

iii) Some other entirely unrelated block might end up with bogus data on decompression.

No: Subsequent blocks should not be affected.

In more detail:

Since you have a system for patching subsequent writes after the fact, the concern is not that zstd will capture a torn snapshot of the input (e.g., will catch a later write to one address but miss an earlier write to some other address). The concern is that zstd will operate on an inconsistent input, i.e., the first time it inspects some location, it has one value, and then later it has some different value. This kind of inconsistency could cause zstd to propagate incorrect values to other parts of the message (although even then I think it wouldn't crash).

Fortunately, if you are doing streaming compression, zstd does not operate on the provided input buffer in-place. The first thing that each compression call will do is memcpy() the provided input buffer into an internal stream history buffer in the ZSTD_CCtx (source). It will then operate only on that internal buffer, not on the buffer you provided. So the compression should remain internally consistent. So I think it should actually work for your purposes.

This guarantee only holds though if you actually do streaming compression and trigger that copy. Zstd has an optimization where if you make an invocation that can be served by the single-shot in-place implementation, it will delegate to that and skip the copy. The existing optimization is easily avoided: make sure your first call in a frame to ZSTD_compressStream2() does not pass ZSTD_e_end (source) and don't set ZSTD_bm_stable. Beyond that, I'm somewhat concerned that you would be relying on undocumented behavior where we might introduce new optimizations in the future. For example, when a chunk of input presented to us is much larger than the window size, we might want to operate on it in-place and only copy the window-sized tail into our internal buffer.

I hope that provides some clarity.

P.S., we are very happy users of QEMU, which we use to test that zstd works on exotic platforms. Thanks for building a great tool!

felixhandte avatar Jul 13 '22 16:07 felixhandte

Thanks for the reply (and I'm glad you like QEMU); given 'No: Subsequent blocks should not be affected' then I think we'll avoid adding a copy in our code for now - but I want to make sure I understand your reply; Is the paragraph 'This guarantee only holds...' breaking the 'Subsequent blocks should not be affected' or only the less of a problem on the current block?

dagrh avatar Jul 13 '22 16:07 dagrh