unified-runtime icon indicating copy to clipboard operation
unified-runtime copied to clipboard

[HIP][CUDA] Refactor using profiling events

Open keyradical opened this issue 1 year ago • 1 comments

HIP changes:

  • To match with the current behaviour of CUDA adapter, EvBase in HIP was moved to device and getElapsedTime function now handles the profiling events' synchronization. Also, we were using hipStreamDefault flag as default, but in CUDA, we use CU_STREAM_NON_BLOCKING, this was also changed to match with cuda, commits 1-3
  • Added an extra profiling stream to Queue which is only created when profiling is enabled and it is used to record EvQueued. This was necessary because before we were recording it on the NULL stream and this might not be the best solution for HIP, see https://github.com/intel/llvm/issues/12904, commit 4

CUDA changes:

  • Also added the extra profiling stream for consistency, commit 5

intel/llvm CI: https://github.com/intel/llvm/pull/13861

keyradical avatar May 20 '24 16:05 keyradical

Also I wouldn't mind if ProfStream had a different name. Maybe something like HostSubmitTimeStream with a comment to explain what it is used for

hdelan avatar May 22 '24 12:05 hdelan