kineto icon indicating copy to clipboard operation
kineto copied to clipboard

Profiler does not trace iterations of CUDA graph while node

Open saagarjha opened this issue 8 months ago • 0 comments

(I'm actually seeing this in the PyTorch profiler, fwiw: I just assume that it's using Kineto under the hood.)

If I create and launch a CUDA graph that contains a while node I see the launch and the first iteration of the kernels in that node, but if the graph is run repeatedly (on the GPU, because of the while node, rather than re-executing it from the host) then the later executions are not captured. Vaguely, assume I have a graph as follows:

Graph:

  • Kernel A
  • Kernel B
  • Kernel C

If I launch the graph, it might run several times, depending on when the kernels set its loop condition to exit. However, in the profiler, I only see ["Kernel A" "Kernel B" "Kernel C"] and no future iterations of it.

saagarjha avatar Mar 21 '25 00:03 saagarjha