exrs
exrs copied to clipboard
Benchmark against the C implementation of OpenEXR
What can be improved or is missing?
Provide benchmarks comparing the performance of this crate to the OpenEXR reference implementation.
Implementation Approach
The openexr
crate provides high-level, mostly safe bindings to the C implementation.
It would be interesting to have benchmarks on x86 as well as ARM.
I've tried running the benchmarks from #181 on a 16-core Ampere Altra machine from Google Cloud, and exrs
in parallel mode absolutely rips :rocket:
running 3 tests
test read_image_rgba_f32_to_f16 ... bench: 32,518,506 ns/iter (+/- 176,580)
test read_image_rgba_f32_to_f32 ... bench: 12,701,451 ns/iter (+/- 205,549)
test read_image_rgba_f32_to_u32 ... bench: 13,428,159 ns/iter (+/- 93,722)
test result: ok. 0 passed; 0 failed; 0 ignored; 3 measured
Running benches/profiling.rs (target/release/deps/profiling-ddc84dc3d9a8e9fd)
running 2 tests
test read_single_image_all_channels ... bench: 22,543,779 ns/iter (+/- 598,279)
test read_single_image_from_buffer_all_channels ... bench: 19,874,611 ns/iter (+/- 439,578)
test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured
Running benches/read.rs (target/release/deps/read-35e4db800494d5a6)
running 8 tests
test read_single_image_rle_all_channels ... bench: 23,277,019 ns/iter (+/- 3,901,982)
test read_single_image_rle_non_parallel_all_channels ... bench: 33,362,265 ns/iter (+/- 293,883)
test read_single_image_rle_non_parallel_rgba ... bench: 36,240,049 ns/iter (+/- 247,417)
test read_single_image_rle_rgba ... bench: 16,579,403 ns/iter (+/- 301,560)
test read_single_image_uncompressed_non_parallel_rgba ... bench: 12,898,483 ns/iter (+/- 171,469)
test read_single_image_uncompressed_rgba ... bench: 13,151,095 ns/iter (+/- 162,987)
test read_single_image_zips_non_parallel_rgba ... bench: 137,174,659 ns/iter (+/- 417,130)
test read_single_image_zips_rgba ... bench: 13,942,807 ns/iter (+/- 319,728)
test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured
Running benches/write.rs (target/release/deps/write-35aaf83004c9be4c)
running 5 tests
test write_nonparallel_zip1_to_buffered ... bench: 445,788,833 ns/iter (+/- 2,537,143)
test write_parallel_any_channels_to_buffered ... bench: 31,390,870 ns/iter (+/- 998,324)
test write_parallel_zip16_to_buffered ... bench: 43,083,441 ns/iter (+/- 2,608,785)
test write_parallel_zip1_to_buffered ... bench: 33,796,498 ns/iter (+/- 1,998,909)
test write_uncompressed_to_buffered ... bench: 21,775,658 ns/iter (+/- 413,420)
Decoding a zipped image in 13 milliseconds, how cool is that?
That is, as long as you don't have to do any pixel format conversions, and don't run into #178 and #182. Those things really rain on our parade if you try to decode into RGBA f16
like the reference OpenEXR does.
we should differentiate, but do both, of the following comparisons:
- Performance with equal settings (as close as possible)
- Performance out of the box