torchtitan icon indicating copy to clipboard operation
torchtitan copied to clipboard

Profiling only a select group of ranks

Open githubsgi opened this issue 8 months ago • 3 comments

Is it possible to profile only a select group of ranks. Becomes hard to handle the large number of files when there are many ranks. I understand that there could be imbalances when only a few ranks are profiled. Do not know if there are ways to profile , but not dump the profile output file.

githubsgi avatar Mar 31 '25 20:03 githubsgi

Currently, TorchTitan only support rank0 profiling or all ranks profiling. But this request is reasonable, do you want to submit a PR for this?

fegin avatar Apr 01 '25 17:04 fegin

Sure, I can do a PR.

githubsgi avatar Apr 01 '25 18:04 githubsgi

Currently, TorchTitan only support rank0 profiling or all ranks profiling.

btw, this is for tensorboard / wandb logs. For profiler, all ranks will be profiled.

I think it's OK to not generate profiler files on all ranks.

tianyu-l avatar Apr 01 '25 23:04 tianyu-l