zstd
zstd copied to clipboard
Specify that decoders may reject non-zero probabilities for larger offset codes than implementation supports
This is a somewhat pedantic change, but basically, the current wording states that out-of-range values in an FSE table are invalid. However, technically offset codes exceeding the decoder's limit are still valid. This adds some text to allow decoders to reject prob tables with unsupported offset codes.
There is one edge case with this that needs to be figured out though: The current spec states that a decoder may limit the maximum offset code length and the minimum recommended is 22. However, the predefined table permits offset codes as large as 28. So, this wording could mean that a decoder is allowed to reject frames using predefined codes, which sounds like something that shouldn't be allowed?
If that's the case, then this text would need to be changed to specify that it only applies to encoded tables, not to the predefined tables, and that predefined tables must be supported.
What would be the benefit of this?
I could easily see speed focused encoders having a predefined table - even if suboptimal having a non-zero probability for unused codes IMO shouldn't be a crime. For other encoders I have experimented with sub-optimal tables to increase their reusability.
If this is for an optimization I don't see how you would avoid having to check offsets anyway - at least for one offset value once you have a filled window. This will just add another (albeit cheap) check for a decoded FSE table.
It seems reasonable that content that is unsupported by the decoder would already have been rejected when checking the requested window size. Decoded offsets are already checked against the window size - so if a block has an offset > window size it would be invalid anyway.
Finally, how do you know there isn't content already out there with this property?
Due to the behavior of less-than-one probabilities, the symbol range determines how much temporary storage is needed for FSE table construction, so it is beneficial for implementations to allow the symbol range to be limited there. If the maximum offset code is bounded by the window size, then maybe the maximum allowed offset code should be specified as 64?
That isn't currently the behavior of the reference decoder though. The behavior of the reference decoder is what's described in this change: ZSTD_decodeSeqHeaders passes MaxOff (31) as the max to ZSTD_buildSeqTable, which passes it to FSE_readNCount, so a frame containing a non-zero probability for an offset code value larger than 31 will be rejected.
offset-codes can already be limited by the decoder:
Offset codes are values ranging from 0 to N. A decoder is free to limit its maximum N supported. Recommendation is to support at least up to 22. For information, at the time of this writing. the reference decoder supports a maximum N value of 31.
My reading is that you can freely reject above 22. There is however no requirement that offset codes base value cannot exceed the window size and IMO it is a rather unsafe assumption that zstd content out there will respect that.
The original sentence, mentioning "out-of-range values in an FSE table", was actually more focused on Literals and Match Lengths, for which the range is bounded.
In the case of offsets, this range is not bounded that clearly. One could derive that, given that the maximum possible window size is the size of content, and this content size is limited by 64-bit unsigned, then we may need up to 63. But then, it's also allowed to reference data into a dictionary of unspecified size, so we could imagine needing even more. This is all theoretical, there is no system out there able nor willing to reach such extreme distances.
And that's the difference between format's limitation, and implementation limitation. Limits on offset values are necessarily a matter of implementation limitation.
An implementation may reject a frame on the ground that it requests capabilities it cannot provide, but that's different from stating that the frame is invalid.
The modified paragraph was focused on determining what's an "invalid" or "corrupted" FSE header. I think it's not helpful to mix this concept with the one of "not supported by this implementation". I would separate them more clearly.
And indeed, as mentioned by @klauspost, the topic of implementation limitation is already covered by another paragraph.
I don't think we'll ever see a real implementation go up to 64-bit offsets, but >32-bit offsets is plausible.
I think there are two main things I want to address:
First, since any offset code >63 will always exceed the maximum possible window size, should 63 be specified as the top of the valid range? (This affects temp usage, which may matter for constrained implementations like ASICs.)
Second, right now the reference decoder rejects a frame if it has non-zero probs for offsets >31. Should the decoder be updated to permit larger offset codes and reject them as out-of-range in sequence decoding or execution instead? If not, should the spec provide guidance that someone writing an encoder could use to avoid hitting this kind of limit, and that someone writing a decoder could expect encoders to follow?
First, since any offset code >63 will always exceed the maximum possible window size, should 63 be specified as the top of the valid range? (This affects temp usage, which may matter for constrained implementations like ASICs.)
Yes, I think that's a reasonable restriction to set.
The maximum Window Size can be reached by declaring the entire Frame Decompressed size as the Window, and setting this decompression size to an (unrealistic) gigantic size allowed by the format. Between 8 ExaBytes and 16 ExaBytes, the 63 offset code would be of use. And that's the maximum that can ever happen by following the RFC8878.
Of course, even this maximum is way beyond practical, so it's not really a "restriction". Rather, this is useful to specify if an hardware implementation must explicitly target a limit value.
If not, should the spec provide guidance that someone writing an encoder could use to avoid hitting this kind of limit, and that someone writing a decoder could expect encoders to follow?
The RFC8878 provides this guidance.
For improved interoperability, it's recommended for decoders to support values of Window_Size up to 8 MB and for encoders not to generate frames requiring a Window_Size larger than 8 MB.
Of course, we are careful to phrase this as a guidance, not a universal limit, because we know that specific implementations may prefer different limits, both smaller (notably on asic low-power systems) or higher (technically, even the zstd reference CLI is in this category, by natively supporting 128 MB as a baseline, and allowing up to 2 GB with additional command line flags).
Basically, we treat Window Size as a local Memory management choice that each decoder is allowed to enforce.
"Limited" decoders, with a lower supported Window Size can exist, even with very low limits such as 32 KB for example, and they should obviously not be expected to decode any random zstd frame. This is fine, as such decoders are typically deployed in controlled environments, where the sources are known, and can be made to respect this condition. It can even be considered a "security" feature, by denying any unexpected source (or a bug) to randomly spoof the receiver with unrealistic memory requirements.
On the other hand, "extended" encoders (and decoders), exploiting more than 8 MB of Window Size, are allowed too. Here also, such frames should not be expected to be decodable by any random zstd decoder. But within a known receiver setting, it's perfectly fine. The reference zstd CLI is one of those extended decoders, and it's reasonable to target it. But sending frames with large window sizes on Internet and expecting them to be decoded by chrome for example will not do. There the 8 MB Window Size limit is strictly enforced.