saxpy-benchmark
saxpy-benchmark copied to clipboard
possible error in omp implementation
https://github.com/bennylp/saxpy-benchmark/blob/fb811ad7a5ac43d53948ca94357e209bbae6a6ed/src/saxpy_omp.cpp#L22
hi! I think this for loop is not correct because
- the loop counter is set to start with thread ID - I see no reason why I should write a loop like this
- the
#pragma omp parallel foris missing - this lead to version essentially being a cpu version. infact, their speed is very similar. on my machine
I am currently working on a version with GPU offloading (clang-9, gcc-8)