torchtitan
torchtitan copied to clipboard
How to use nsys?
Is there a recommended way to use nsys / nsight? I know there's a profiling hook for using the Pytorch profiler, but I'm wondering how to use nsys instead.
Can I use these APIs:
with torch.autograd.profiler.emit_nvtx():
profiler.start()
y = x.view(1, -1)
z = x.to(memory_format=torch.channels_last)
zz = z.reshape(1, -1)
profiler.stop()
Furthermore, I'm not sure which of the below I'm supposed to use:
import torch.cuda.profiler as profiler
with torch.autograd.profiler.emit_nvtx():
Hey @vedantroy, IIUC emit_nvtx is just adding addititonal information into the trace. To actually profile your program with nsys, you have start your program with it (e.g., nsys profile --gpu-metrics-device=0 -o [output] [command]).