The Execution Time Difference Between Event_Profiling and Steady Clock
I am trying to profile my code and I am getting different results each time I run the code.
auto start = std::chrono::steady_clock::now();
foo()
auto end = std::chrono::steady_clock::now();
time = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count();
why is this giving me different results than using
start_kernel_time = event.get_profiling_info<info::event_profiling::command_start>();
end_kernel_time = event.get_profiling_info<info::event_profiling::command_end>();
time_kernel = (end_kernel_time - start_kernel_time) / kNs; // kNs = 1e9
I run the code several times and interesting thing is steady_clock version first printed out 6.7 seconds and then in the second run printed out 0.07 seconds ( unrealistic) then the event profiling was printed the ~0.64 seconds which made more sense. Why it is different ?
I am using oneapi 2023 with icpx compiler
OneSmpl_Team1 working on this
(OneSmpl_Team1)
@kaanolgu
When you run the application on the device for the first time, there will be additional JIT (just-in-time) compilation and as a result, steady_clock will show large number for the first run. You can use AOT (Ahead of Time Compilation) (https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2023-1/ahead-of-time-compilation.html) to solve this. It can tune code generation for a specific target.
Please note that, a program built with AOT compilation for specific target device(s) will not run on different device(s). So you should carefully select the proper target device.