Support Fractional-Power-of-Two Window Sizes in Compression
This PR adds support for configuring and performing compression with non-power-of-two window sizes.
The frame format and decoder have always supported more window sizes than just powers of two. The frame format defines a 3 bit window mantissa field in addition to the 4 bit window log descriptor. However, the compressor has only ever supported power-of-two windows. This PR changes that. In order to do that, it makes a number of changes:
- It separates the public and private CParams struct definitions.
- It adds a
windowFracfield to the private CParams (but not to the public one, because we are scared of changing that struct definition. - It adds a new CCtx Param
ZSTD_c_windowFracwhich allows users to set this param.
To-Do:
- [x] Testing.
- [x] CLI Support.
I agree that it would have been more expedient to just put it in the CCtxParams, but it seems bad to separate these highly interdependent variables.
In this future where the Zstd compression side supports fractional window sizes, it is a bug if you look at just the windowLog and don't look at the windowFrac at the same time. I worry that if they're not right next to each other, it is both mechanically harder to make both available to all the places that interact with window sizing, it is also easier to make mistakes as a developer and look only at the windowLog. (Similarly, if you look closely at the code for setting these values, there are nuances that require that they interact.)
And, as we discussed, the match-finders need access to this parameter, so it needs to be available from the MatchState, which currently only has the CParams, and not the CCtxParams.
I considered an alternate approach where, rather than introduce a new variable, we change the internal windowLog to be a "windowDesc" which carries both components, as, e.g., windowLog << 3 | windowFrac. But then even if you didn't have to change the struct shape, you'd still have to translate the values of the internal cParams with the public ones, and it's actually harder to validate that you're doing that correctly if they are the same struct.
Just some minor naming suggestions.
zstd_compress.c is relatively straightforward to review, compared to #4305, because the code structure remains the same. So no comment there.