AMDGPU.jl
AMDGPU.jl copied to clipboard
Integrate Dagger's event logging for timeline profiling
This would reduce the need for using rocprof or the Profile stdlib to observe kernel execution ordering and latency hiding efficiency.
That'd be a really helpful addition.