compressstream-explainer icon indicating copy to clipboard operation
compressstream-explainer copied to clipboard

gzip algorithm(s)

Open annevk opened this issue 4 years ago • 7 comments

I'd love to see this be as deterministic as our text encoding setup, even if it needs to evolve over time somehow.

annevk avatar Aug 29 '19 07:08 annevk

We used to have a test in Chromium that relied on fixed compressed output for a particular input, but the people who are trying to improve our gzip implementation complained, so we removed it. It's under active development, mainly because it's important for PNG performance.

Once of the attractive things about CompressStream is that we're just exposing an interface to compression code we're already shipping, so I'd prefer not to have to add a second implementation.

There's precedent of a sort in that the canvas.toBlob() method already exposes the algorithm we're using when it's used to output a PNG.

ricea avatar Aug 29 '19 11:08 ricea

I'm be rather worried about compatibility fallout, also with that canvas API.

annevk avatar Aug 29 '19 14:08 annevk

Based on my experience, I think a dependency on the actual byte output of a compression algorithm is rare, so the risk is low.

I consider the benefit to browsers being able to use different underlying algorithms, change optimisations, or even replace the library altogether to outweigh the compatibility risk.

I am keeping this issue open to collect additional data points. If, for example, we end up requiring everyone to use zlib then the cost/benefit tradeoff changes.

ricea avatar Sep 03 '19 07:09 ricea

Do all browsers already use zlib in some observable way? (I'm reminded somewhat of the SQLite debacle though this seems less severe.)

annevk avatar Sep 04 '19 06:09 annevk

@annevk It's my understanding that Microsoft Edge doesn't use zlib. It has its own implementation of deflate (and gzip). As far as I know, once Edge moves to Chromium all the evergreen browsers tracked by MDN will be using some version of zlib.

ricea avatar Sep 05 '19 05:09 ricea

It seems to me we should standardize the algorithm then as otherwise we create an implicit dependency upon zlib rather than a standard everyone can implement.

annevk avatar Sep 10 '19 08:09 annevk

My recollection is that we (Microsoft) do use a version of zlib under the covers, but it is an older version which has been slightly modified/recompiled in such a way that it satisfies Microsoft's various code sanitizer/analyzer tools.

Having said that, I feel pretty strongly that we shouldn't attempt to make a compression API be byte-output deterministic, as this will significantly increase complexity and preclude improvements in compression rate over time.

It's also not clear to me that Zlib is fully deterministic today (there's a header byte that tracks on what OS the compressor ran). Improved versions of DEFLATE compression (e.g. Zopfli) can be non-deterministic due to corner-cases in floating-point math. Matters get even more complicated when you consider that the compressor might generate different block sizes depending on flushes.

ericlaw1979 avatar Sep 16 '19 06:09 ericlaw1979