Improve block type selection algorithm
I am currently developing a library that uses data compression, and to handle that task I have chosen to use the flate2 crate with its default compression backend, miniz_oxide.
While in the process of writing unit tests for my library, I noticed that a particular piece of data was not generating the expected compression result. After further investigation, I realized that the data produced from the compression was made of a single non-compressed data block. That same data, however, when compressed using zlib, produces a different result, which is comprised of one compressed data block.
This can be verified using the code snippet below.
#[test]
fn it_compress_issue() {
let data = r#"{"status":"success","data":{"messageId":"mg9x9vCqYMg9YtKdDwQx"}}"#.as_bytes();
// Compression using 'miniz_oxide' crate directly
let compressed_data = miniz_oxide::deflate::compress_to_vec(data, 9);
assert!(compressed_data.len() > data.len());
assert_eq!(&compressed_data.as_slice()[5..], data);
// Compression using 'flate2' crate with 'zlib' feature enabled
let mut enc = flate2::read::DeflateEncoder::new(data, Compression::default());
let mut compressed_data_2 = Vec::new();
enc.read_to_end(&mut compressed_data_2).unwrap();
assert!(compressed_data_2.len() < data.len());
}
This might be related to issue #77, I guess.
Yeah it might similar to what causes differences in #77, the block selection algorithm being a bit too dumb. You could check by seeing if you get the same result with the C miniz backend (or C miniz with same settings).
Yeah, looked at it a bit, it's due to the simpler block selection algorithm in miniz_oxide (and C miniz). May change it to do a more thorough check like zlib, though it requires a little restructuring.