unified-runtime
unified-runtime copied to clipboard
[HIP][CUDA] Refactor using profiling events
HIP changes:
- To match with the current behaviour of CUDA adapter,
EvBasein HIP was moved todeviceandgetElapsedTimefunction now handles the profiling events' synchronization. Also, we were usinghipStreamDefaultflag as default, but in CUDA, we useCU_STREAM_NON_BLOCKING, this was also changed to match with cuda, commits 1-3 - Added an extra profiling stream to
Queuewhich is only created when profiling is enabled and it is used to recordEvQueued. This was necessary because before we were recording it on theNULLstream and this might not be the best solution for HIP, see https://github.com/intel/llvm/issues/12904, commit 4
CUDA changes:
- Also added the extra profiling stream for consistency, commit 5
intel/llvm CI: https://github.com/intel/llvm/pull/13861
Also I wouldn't mind if ProfStream had a different name. Maybe something like HostSubmitTimeStream with a comment to explain what it is used for