zstd icon indicating copy to clipboard operation
zstd copied to clipboard

Support Fractional-Power-of-Two Window Sizes in Compression

Open felixhandte opened this issue 11 months ago • 2 comments

This PR adds support for configuring and performing compression with non-power-of-two window sizes.

The frame format and decoder have always supported more window sizes than just powers of two. The frame format defines a 3 bit window mantissa field in addition to the 4 bit window log descriptor. However, the compressor has only ever supported power-of-two windows. This PR changes that. In order to do that, it makes a number of changes:

  • It separates the public and private CParams struct definitions.
  • It adds a windowFrac field to the private CParams (but not to the public one, because we are scared of changing that struct definition.
  • It adds a new CCtx Param ZSTD_c_windowFrac which allows users to set this param.

To-Do:

  • [x] Testing.
  • [x] CLI Support.

felixhandte avatar Jan 30 '25 21:01 felixhandte

I agree that it would have been more expedient to just put it in the CCtxParams, but it seems bad to separate these highly interdependent variables.

In this future where the Zstd compression side supports fractional window sizes, it is a bug if you look at just the windowLog and don't look at the windowFrac at the same time. I worry that if they're not right next to each other, it is both mechanically harder to make both available to all the places that interact with window sizing, it is also easier to make mistakes as a developer and look only at the windowLog. (Similarly, if you look closely at the code for setting these values, there are nuances that require that they interact.)

And, as we discussed, the match-finders need access to this parameter, so it needs to be available from the MatchState, which currently only has the CParams, and not the CCtxParams.

I considered an alternate approach where, rather than introduce a new variable, we change the internal windowLog to be a "windowDesc" which carries both components, as, e.g., windowLog << 3 | windowFrac. But then even if you didn't have to change the struct shape, you'd still have to translate the values of the internal cParams with the public ones, and it's actually harder to validate that you're doing that correctly if they are the same struct.

felixhandte avatar Feb 04 '25 15:02 felixhandte

Just some minor naming suggestions.

zstd_compress.c is relatively straightforward to review, compared to #4305, because the code structure remains the same. So no comment there.

Cyan4973 avatar Mar 04 '25 19:03 Cyan4973