hdf5-rust
hdf5-rust copied to clipboard
Blosc filters have no effect
Creating a dataset with any of the blosc
filters compiles and runs with no errors, but does not compress the data at all.
If I use lzf
or szip
instead, the dataset is compressed as expected.
Just to be clear, the filter does appear to be applied (looking at the output of h5dump
), but there is no compression.
Are there any external dependencies needed for blosc to work?
Here is a minimal example:
use hdf5::filters;
use ndarray::Array2;
use std::env::temp_dir;
fn main() -> anyhow::Result<()> {
println!("Blosc available? {:}", filters::blosc_available());
println!("LZF available? {:}", filters::lzf_available());
println!("SZIP available? {:}", filters::szip_available());
let path_uncomp = temp_dir().join("uncompressed.h5");
let path_comp = temp_dir().join("compressed.h5");
let file_uncomp = hdf5::File::create(&path_uncomp)?;
let file_comp = hdf5::File::create(&path_comp)?;
let data = Array2::<f32>::ones((1000, 1000));
file_uncomp
.new_dataset_builder()
.with_data(data.view())
.create("data")?;
file_comp
.new_dataset_builder()
.blosc_lz4(9, true)
//.blosc_zstd(9, true)
//.blosc_snappy(9, true)
//.lzf()
//.szip(filters::SZip::NearestNeighbor, 16)
.with_data(data.view())
.create("data")?;
println!(
"Uncompressed file size: {:} kB",
path_uncomp.metadata()?.len() / 1024
);
println!(
"Compressed file size: {:} kB",
path_comp.metadata()?.len() / 1024
);
Ok(())
}
Cargo.toml:
[dependencies]
anyhow = "1.0.80"
hdf5 = { git = "https://github.com/aldanor/hdf5-rust.git", features = [
"blosc",
"lzf",
] }
ndarray = { version = "0.15.6" }
The output is:
Blosc available? true
LZF available? true
SZIP available? true
Uncompressed file size: 3908 kB
Compressed file size: 3910 kB
Using szip
, the compressed file size is 12 kB.
This would happen if the compressor is not available for blosc
. If one specifies --features blosc-src/lz4,blosc-src/zlib
one gets down to 19kB with the blosc-lz4 filter and 8kB with blosc-zlib.
It is unfortunate that we don't error on trying to apply the filter when it is not available, but instead skip it. Setting https://github.com/aldanor/hdf5-rust/blob/4a9b537f0c7ba3f75712ba240fe9ffeb1fd9447e/hdf5/src/hl/filters.rs#L472 to the mandatory flag would provide such a message
I see, thank you. I added blosc-src = { version = "0.3.0", features = ["lz4", "zlib", "zstd"] }
to Cargo.toml
to make it work. May I suggest adding this to the documentation of the blosc_
functions?
Agreed that an error would be great in this case, or maybe even a more in-depth function like blosc_available
that would return which of the blosc filters are available.