exrs Benchmark against the C implementation of OpenEXR

What can be improved or is missing?

Provide benchmarks comparing the performance of this crate to the OpenEXR reference implementation.

Implementation Approach

The openexr crate provides high-level, mostly safe bindings to the C implementation.

Jan 03 '23 12:01 Shnatsel

It would be interesting to have benchmarks on x86 as well as ARM.

I've tried running the benchmarks from #181 on a 16-core Ampere Altra machine from Google Cloud, and exrs in parallel mode absolutely rips :rocket:

running 3 tests
test read_image_rgba_f32_to_f16 ... bench:  32,518,506 ns/iter (+/- 176,580)
test read_image_rgba_f32_to_f32 ... bench:  12,701,451 ns/iter (+/- 205,549)
test read_image_rgba_f32_to_u32 ... bench:  13,428,159 ns/iter (+/- 93,722)

test result: ok. 0 passed; 0 failed; 0 ignored; 3 measured

     Running benches/profiling.rs (target/release/deps/profiling-ddc84dc3d9a8e9fd)

running 2 tests
test read_single_image_all_channels             ... bench:  22,543,779 ns/iter (+/- 598,279)
test read_single_image_from_buffer_all_channels ... bench:  19,874,611 ns/iter (+/- 439,578)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured

     Running benches/read.rs (target/release/deps/read-35e4db800494d5a6)

running 8 tests
test read_single_image_rle_all_channels               ... bench:  23,277,019 ns/iter (+/- 3,901,982)
test read_single_image_rle_non_parallel_all_channels  ... bench:  33,362,265 ns/iter (+/- 293,883)
test read_single_image_rle_non_parallel_rgba          ... bench:  36,240,049 ns/iter (+/- 247,417)
test read_single_image_rle_rgba                       ... bench:  16,579,403 ns/iter (+/- 301,560)
test read_single_image_uncompressed_non_parallel_rgba ... bench:  12,898,483 ns/iter (+/- 171,469)
test read_single_image_uncompressed_rgba              ... bench:  13,151,095 ns/iter (+/- 162,987)
test read_single_image_zips_non_parallel_rgba         ... bench: 137,174,659 ns/iter (+/- 417,130)
test read_single_image_zips_rgba                      ... bench:  13,942,807 ns/iter (+/- 319,728)

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured

     Running benches/write.rs (target/release/deps/write-35aaf83004c9be4c)

running 5 tests
test write_nonparallel_zip1_to_buffered      ... bench: 445,788,833 ns/iter (+/- 2,537,143)
test write_parallel_any_channels_to_buffered ... bench:  31,390,870 ns/iter (+/- 998,324)
test write_parallel_zip16_to_buffered        ... bench:  43,083,441 ns/iter (+/- 2,608,785)
test write_parallel_zip1_to_buffered         ... bench:  33,796,498 ns/iter (+/- 1,998,909)
test write_uncompressed_to_buffered          ... bench:  21,775,658 ns/iter (+/- 413,420)

Decoding a zipped image in 13 milliseconds, how cool is that?

That is, as long as you don't have to do any pixel format conversions, and don't run into #178 and #182. Those things really rain on our parade if you try to decode into RGBA f16 like the reference OpenEXR does.

Jan 03 '23 12:01 Shnatsel

we should differentiate, but do both, of the following comparisons:

Performance with equal settings (as close as possible)
Performance out of the box

Jan 06 '23 22:01 johannesvollmer

exrs exrs copied to clipboard

Benchmark against the C implementation of OpenEXR

What can be improved or is missing?

Implementation Approach

exrs
exrs copied to clipboard