miniz_oxide Improve block type selection algorithm

I am currently developing a library that uses data compression, and to handle that task I have chosen to use the flate2 crate with its default compression backend, miniz_oxide.

While in the process of writing unit tests for my library, I noticed that a particular piece of data was not generating the expected compression result. After further investigation, I realized that the data produced from the compression was made of a single non-compressed data block. That same data, however, when compressed using zlib, produces a different result, which is comprised of one compressed data block.

This can be verified using the code snippet below.

    #[test]
    fn it_compress_issue() {
        let data = r#"{"status":"success","data":{"messageId":"mg9x9vCqYMg9YtKdDwQx"}}"#.as_bytes();

        // Compression using 'miniz_oxide' crate directly
        let compressed_data = miniz_oxide::deflate::compress_to_vec(data, 9);

        assert!(compressed_data.len() > data.len());
        assert_eq!(&compressed_data.as_slice()[5..], data);

        // Compression using 'flate2' crate with 'zlib' feature enabled
        let mut enc = flate2::read::DeflateEncoder::new(data, Compression::default());
        let mut compressed_data_2 = Vec::new();

        enc.read_to_end(&mut compressed_data_2).unwrap();

        assert!(compressed_data_2.len() < data.len());
    }

This might be related to issue #77, I guess.

Dec 18 '20 11:12 claudiosdc

Yeah it might similar to what causes differences in #77, the block selection algorithm being a bit too dumb. You could check by seeing if you get the same result with the C miniz backend (or C miniz with same settings).

Dec 22 '20 23:12 oyvindln

Yeah, looked at it a bit, it's due to the simpler block selection algorithm in miniz_oxide (and C miniz). May change it to do a more thorough check like zlib, though it requires a little restructuring.

Jan 02 '21 21:01 oyvindln