oneAPI-samples icon indicating copy to clipboard operation
oneAPI-samples copied to clipboard

The Execution Time Difference Between Event_Profiling and Steady Clock

Open kaanolgu opened this issue 2 years ago • 2 comments

I am trying to profile my code and I am getting different results each time I run the code.

auto start = std::chrono::steady_clock::now();
foo()      
auto end = std::chrono::steady_clock::now();
time = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count();

why is this giving me different results than using

start_kernel_time = event.get_profiling_info<info::event_profiling::command_start>();
end_kernel_time = event.get_profiling_info<info::event_profiling::command_end>();
time_kernel = (end_kernel_time - start_kernel_time) / kNs; // kNs = 1e9

I run the code several times and interesting thing is steady_clock version first printed out 6.7 seconds and then in the second run printed out 0.07 seconds ( unrealistic) then the event profiling was printed the ~0.64 seconds which made more sense. Why it is different ?

I am using oneapi 2023 with icpx compiler

kaanolgu avatar Feb 13 '23 16:02 kaanolgu

OneSmpl_Team1 working on this

hexu33 avatar Apr 25 '23 01:04 hexu33

(OneSmpl_Team1)

@kaanolgu

When you run the application on the device for the first time, there will be additional JIT (just-in-time) compilation and as a result, steady_clock will show large number for the first run. You can use AOT (Ahead of Time Compilation) (https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2023-1/ahead-of-time-compilation.html) to solve this. It can tune code generation for a specific target.

Please note that, a program built with AOT compilation for specific target device(s) will not run on different device(s). So you should carefully select the proper target device.

hexu33 avatar Apr 25 '23 01:04 hexu33