radeon_compute_profiler icon indicating copy to clipboard operation
radeon_compute_profiler copied to clipboard

Kernel execution serialization

Open yupinov opened this issue 6 years ago • 4 comments

Is there an option for making all the kernels execute sequentially (especially when work is launched in multiple queues)? Coming from CUDA and nvprof, I was surprised to not find such a feature for the better kernel performance understanding.

yupinov avatar Apr 23 '18 10:04 yupinov

When collecting performance counters, the profiler will introduce serialization to try to ensure that only one kernel is executing at a time. There is no option for this, as it is the default behavior.

chesik-amd avatar Apr 23 '18 12:04 chesik-amd

What about measuring performance in real-life environment under concurrent execution?

Additionally this seems to imply that traces in CodeXL can't be used to analyze kernel overlap?

pszi1ard avatar Feb 15 '19 20:02 pszi1ard

Serialization is only done when collecting performance counters (which is the mode you would use to analyze performance of individual kernels). No additional serialization is introduced when collecting a trace (which is the mode you would use to analyze an entire application (including kernel overlap)).

chesik-amd avatar Feb 15 '19 20:02 chesik-amd

I see. I'd suggest allowing serialization to be turned on/off.

Is there a way to measure wall-time only without serialization?

pszi1ard avatar Feb 15 '19 22:02 pszi1ard