ecs_bench_suite icon indicating copy to clipboard operation
ecs_bench_suite copied to clipboard

User code vs benchmark monstrosities

Open leudz opened this issue 5 years ago • 1 comments

I was playing with frag iteration and I made 3 versions:

  • V1 is a simple loop

time: [407.12 ns 409.32 ns 411.81 ns]

V1
self.0.run(|mut data: ViewMut<Data>| {
    (&mut data).iter().for_each(|data| {
        data.0 *= 2.0;
    })
});
  • V2 helps the compiler auto-vectorize

time: [165.88 ns 166.45 ns 167.05 ns]

V2
self.0.run(|mut data: ViewMut<Data>| {
    (&mut data)
        .iter()
        .into_chunk_exact(4)
        .unwrap_or_else(|_| panic!())
        .for_each(|chunk| {
            chunk[0].0 *= 2.0;
            chunk[1].0 *= 2.0;
            chunk[2].0 *= 2.0;
            chunk[3].0 *= 2.0;
        })
});
- V3 explicitly uses simd

time: [127.37 ns 129.08 ns 131.26 ns]

V3
use core::arch::x86_64::*;

unsafe {
    let delta = _mm_set1_ps(2.0);

    self.0.run(|mut data: ViewMut<Data>| {
        (&mut data)
            .iter()
            .into_chunk_exact(4)
            .unwrap_or_else(|_| panic!())
            .for_each(|chunk| {
                let simd_chunk = _mm_loadu_ps(chunk as *const _ as *const _);
    
                _mm_mul_ps(simd_chunk, delta);
    
                _mm_storeu_ps(chunk as *mut _ as *mut _, simd_chunk);
            })
    });
}

V2 and V3 will likely not be used by many people (if any). And the time is ridiculously small either way.

My question is: should the benchmarks only use code that users would use, try to optimize as much as possible or somewhere in-between?

leudz avatar Aug 25 '20 04:08 leudz

Excellent question. IMO we should favor realistic code; the goal here is to benchmark the libraries, not compete to see who can write the cheekiest benchmark code.

Ralith avatar Aug 25 '20 05:08 Ralith