EasyCompressor
EasyCompressor copied to clipboard
Deterministic compression
See this article here: https://dramsch.net/today-i-learned/gzip/today-i-learned-about-deterministic-gzip-compression/
I'd like to do the same thing but it appears EasyCompressor doesn't expose the necessary options to make GZip deterministic.
After looking into this a bit more, it appears that none of the EasyCompressor formats expose the necessary options to make them deterministic. We should be able to set the timestamp to 0 like in that article and get the exact same compressed output for the same decompressed input. If there's randomness, we should be able to control the seed.
It's irrelevant to this library.
The parameter -n of gzip command is to not include the timestamp of the original file.
And it's all about compressing files.
But this library is to compress/decompress data (such as byte[] or stream) not files.
And those data do not have any timestamps.
It's irrelevant to this library. The parameter
-nofgzipcommand is to not include the timestamp of the original file. And it's all about compressing files. But this library is to compress/decompress data (such as byte[] or stream) not files. And those data do not have any timestamps.
OK well, I found in practice that EasyCompressor output is non-deterministic. Can any of it be made deterministic?
Actually, it is deterministic out of the box. Since this library works with data (not files) and there isn't a timestamp here to include, so it's always deterministic. For example, if you compress the same (un-changed) data many times, the compressed outputs (and their hashes) will be the same.
Actually, it is deterministic out of the box. Since this library works with data (not files) and there isn't a timestamp here to include, so it's always deterministic. For example, if you compress the same (un-changed) data many times, the compressed outputs (and their hashes) will be the same.
Oh, I think I know what happened.
I ran one test on Blazor WASM and another on Windows and got different results.
Maybe it's something to do with which platform.
I reproduced your example with different compressors and I found a there is a weird difference in GZip compressed output between server-side .NET and client-side (Blazor WASM).
Brotli is not supported on Blazor WASM and the others (Deflate, LZ4, LZMA, Zstd, and Snappy) algorithms work fine (the same) between server and browser.
GZip compressed output (and thereby its hash) is different between server and client. However, the uncompressed data is equal to the original data before compression.
I should investigate more on it to find if it's a mistake implementation in this library or if it's a BUG for .NET runtime.
The Repo: https://github.com/mjebrahimi/BlazorWebAssembly-GZip-Difference
Yeah sorry I didn't realize I'd actually done the two tests on different runtimes when I made the initial post. Not a big deal: it just explains why my unit test failed. Thanks. :)
You're welcome. Anyway, it's an interesting problem you found and I will inform you with an update in a few days after more investigation on it.
It seems to also affect System.IO.Compression on .NET Standard 2.0 as well so apparently it isn't specific to EasyCompressor.