kineto
kineto copied to clipboard
Profiler does not trace iterations of CUDA graph while node
(I'm actually seeing this in the PyTorch profiler, fwiw: I just assume that it's using Kineto under the hood.)
If I create and launch a CUDA graph that contains a while node I see the launch and the first iteration of the kernels in that node, but if the graph is run repeatedly (on the GPU, because of the while node, rather than re-executing it from the host) then the later executions are not captured. Vaguely, assume I have a graph as follows:
Graph:
- Kernel A
- Kernel B
- Kernel C
If I launch the graph, it might run several times, depending on when the kernels set its loop condition to exit. However, in the profiler, I only see ["Kernel A" "Kernel B" "Kernel C"] and no future iterations of it.