flate2-rs icon indicating copy to clipboard operation
flate2-rs copied to clipboard

Enable window_bits and gzip functions for zlib-rs backend

Open Copilot opened this issue 2 weeks ago • 9 comments

Several functions were unnecessarily restricted to C-based zlib backends despite working with pure Rust backends like zlib-rs.

Changes

Changed feature gate from any_zlib to any_impl for:

  • Compress::new_with_window_bits() - Custom window size for compression
  • Compress::new_gzip() - Gzip-formatted compression
  • Decompress::new_with_window_bits() - Custom window size for decompression
  • Decompress::new_gzip() - Gzip-formatted decompression

These functions work across all backends because:

  • Backend trait make() methods already accept window_bits parameter
  • Gzip mode uses standard window_bits + 16 convention supported universally

Dictionary functions (set_dictionary, set_level) remain any_zlib-only as they require direct stream manipulation not exposed in zlib-rs stable API.

Tests

Added validation for newly enabled functions:

  • Window size variations (9, 12, 15)
  • Boundary validation (panics on invalid values)
  • Round-trip compression/decompression
// Now works with zlib-rs backend
let mut encoder = Compress::new_gzip(Compression::default(), 9);
encoder.compress_vec(data, &mut output, FlushCompress::Finish)?;

let mut decoder = Decompress::new_gzip(9);
decoder.decompress(&output, &mut decoded, FlushDecompress::Finish)?;
Original prompt

https://github.com/rust-lang/flate2-rs/issues/517 See how the functions that rely on #[cfg(feature = "any_zlib")] can also be implemented when zlib-rs is set, which doesn't use the C FFI anymore. Add tests that validate that these functions are present with zlib-rs. If one of these can't be implemented with zlib-rs, that's fine too.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot avatar Dec 07 '25 08:12 Copilot

@folkertdev I'd need your help here 🙏.

I think this PR shows that set_dictionary and set_level currently aren't supported for zlib-rs because they need more access. Is this something you think zlib-rs will support (or does already?) so we can merge this PR and have a release that isn't considered breaking? Thanks again for your help.

Byron avatar Dec 07 '25 08:12 Byron

I would have expected this PR to break CI given that zlib-rs does not currently expose the API to implement those features?

I'll need to make a quick release exposing some additional things. We have the functionality, just not the API.

folkertdev avatar Dec 08 '25 13:12 folkertdev

I would have expected this PR to break CI given that zlib-rs does not currently expose the API to implement those features?

Actually the test doesn't check for set_dictionary and set_level for that reason, but everything else is exposed.

Once we have access to these via the Rust API, the test can be adjusted to also validate them.

In theory, it could then be copy-pasted for each of the other implementations.

In any case, thanks for your help, it's much appreciated!

Byron avatar Dec 09 '25 04:12 Byron

Some fun kind of undocumented behavior: the native zlib backends count the dictionary length towards total_in, but the flate2 bookkeeping does not (it only counts input to compress and decompress). Is that deviation deliberate?

folkertdev avatar Dec 09 '25 12:12 folkertdev

Maybe @jongiddy has more context, as this question is way over my head.

Knowing that some models as built into the GH chat window have access to a lot of context and can do some thinking/planning, I thought maybe sharing their output here can be useful. Maybe not, I really can't even decide that 😅.

GPT 4.1 Mini

Short answer: probably a deliberate bookkeeping choice in flate2 rather than a spec bug in zlib. zlib's C implementation treats the preset dictionary as part of the stream history and increments its internal total_in/total_out counters to reflect that the stream now "contains" those bytes; flate2's counters reflect only the bytes that were actually passed to the compress()/decompress() calls by the caller (i.e., explicit input), and so do not add in the implicit dictionary length.

Why that difference exists (summary)

  • zlib C API semantics: when you give zlib a preset dictionary (deflateSetDictionary / inflateSetDictionary) the library internally takes that dictionary into account as stream history; the library’s total_in may be updated to include the dictionary length because the library views those bytes as part of the stream state.
  • flate2 bookkeeping: flate2 exposes total_in/total_out that count bytes passed through its API, which excludes any bytes that were not actually passed to compress()/decompress() (so a dictionary you hand in separately is excluded).
  • Result: two backends that both produce correct compressed output may still report different total_in values because they're answering slightly different questions about what "total input" means.

Is it deliberate?

  • It’s plausible and reasonable that flate2 authors intentionally count only the explicit API input (that keeps the meaning clear: “how many bytes did you hand me in these calls?”).
  • It could also be an implementation divergence (or an oversight) between backends — the important point is that the semantics are not identical across backends today.

What I recommend

  1. If you need consistent semantics in your code, treat flate2’s total_in as “bytes passed to the API” and, when you need parity with zlib’s native behavior, add dict.len() yourself to the reported total_in when you supplied a preset dictionary to the native backend.
  2. If you think flate2 should document or change this behavior (or at least make it consistent across backends), open an issue on the flate2 repository with a minimal repro and the exact backend/feature flags you used (native zlib vs pure-Rust backend).
  3. When opening the issue, include:
    • The flate2 version and the backend feature you used (native zlib vs miniz_oxide / pure Rust).
    • A tiny reproducible program: compress or decompress with an explicit preset dictionary, and print total_in from both backends so maintainers can reproduce the discrepancy.
    • The exact numbers you observed and what you expected.
    • Why you care about total_in including/excluding the dictionary (e.g., for progress reporting, accounting, or compatibility with other tools).

Suggested issue title/body (you can paste/edit this):

Title: total_in behavior differs from system zlib when using a preset dictionary

Body: Summary: When using a preset dictionary, the system zlib backend increases zlib's total_in by the dictionary length, but flate2's bookkeeping (the value returned from total_in()) does not include the dictionary length. This makes total_in differ across backends.

Repro:

  • flate2 version:
  • Backends tested: system zlib (feature X), pure-Rust backend (feature Y)
  • Minimal steps: (1) set a preset dictionary, (2) call compress/decompress on some data, (3) read total_in().

Observed:

  • system zlib backend: total_in == bytes_passed + dict.len()
  • flate2 bookkeeping: total_in == bytes_passed

Expected / question: Should flate2:

  • document that total_in excludes preset dictionary length, or
  • change behavior to include the dictionary length for parity with system zlib, or
  • expose both values (explicit input vs stream-history-inclusive total)?

Workaround: Add dict.len() to total_in yourself when you need the dictionary counted.

If you’d like, I can help you draft a minimal reproducible Rust snippet to include in the issue (or a short test) showing the differing totals for the two backends. Which backend(s) and flate2 version are you testing?

Byron avatar Dec 09 '25 16:12 Byron

With the arrival of #522 I think we can either merge this PR as is, or put #522 on top of this one, and merge everything together. My hope is that we can do one after another. Could @jongiddy take a look and see what's needed to get this merged? Thanks again 🙏

Byron avatar Dec 10 '25 04:12 Byron

Can we split any_zlib into "supports the zlib api" and "uses a C implementation" flags?

  • any_zlib_api: any backend that is able to provide something consistent with the full zlib api (so excludes miniz_oxide, but includes zlib-rs)
  • any_c_zlib: uses ffi/c.rs as the implementation, configures out the other implementations.

folkertdev avatar Dec 10 '25 11:12 folkertdev

Absolutely. I also believe that these any_* feature toggles should be made to work for us, as long as we don't lose functionality that existed before.

From what I see, this should only involve zlib-rs as its C bindings supported something that the Rust API now has to re-expose. Interestingly, now it seems we are at risk to lose functionality that we had before by changing how the flags work, and I'd love it if we could maybe tune the zlib-rs-capabilities into a test that runs for each backend that supports the whole set of 'extras', probably with precautions for miniz_oxide to not fail due to window_bits.

Byron avatar Dec 11 '25 03:12 Byron

Hmm well I made the current setup work, and the split would probably not really help that much.

folkertdev avatar Dec 11 '25 10:12 folkertdev

With these changes, I think this PR could be ready to go. It makes clear which 'special methods' are supported, and basically makes clear that all backends support everything, except for miniz_oxide which does not support set_level() or set_dictionary().

Probably it was for that reason that I invited @oyvindln as a reviewer as well, even though this PR doesn't touch the miniz_oxide implementation - please feel free to ignore.

When merged, I'd try another release, so I took the liberty to bump the version once more.

Byron avatar Dec 14 '25 13:12 Byron

The miniz_oxide backend doesn't support window_bits. Regardless of what value you pass, it will always create zlib bitstreams with the header indicating window_bits=15.

Before this PR, the new_with_window_bits methods were gated behind the necessary feature flags so that they would only be exposed on backends which respected the value. That seems preferable to sometimes silently ignoring the argument?

fintelia avatar Dec 14 '25 18:12 fintelia

Thanks a lot for the review @fintelia and sorry for not getting it earlier. Now the 'capabilities' test shows that all the zlib related things are basically unsupported by miniz_oxide, and are supported by zlib-rs.

I'd love to do another refactor of the tests in general, but I will wait to not make reviewing this PR harder. Ideally we conclude that it doesn't break anything, so we can finally have a release that makes direct use of zlib-rs which, to my mind, brings it into position to becoming the default backend.

Byron avatar Dec 15 '25 08:12 Byron