adda
adda copied to clipboard
Timing of matrix vector multiplication in OpenCL mode
The precise timing possibility of OpenCL matvec (removed in r1334) makes it
hard to track issues with the OpenCL kernels on different devices. A desired
goal would be to add an option to change the command queue into profiling mode
and get the precise timings from events which are returned by
clEnqueueNDRangeKernel. It should be possible to do this during runtime, so it
can be implemented as option to adda directly instead of a compiler option
using ifdefs.
This would help to identify performance issues of the kernels on different
devices.
r1334 - febb9ca148e8b25dfb05870743b38c780efc9fee
Original issue reported on code.google.com by [email protected] on 21 May 2014 at 7:22