torchtitan icon indicating copy to clipboard operation
torchtitan copied to clipboard

Extend Dry Run Mode to Cover Trainer Initialization in TorchTitan

Open fegin opened this issue 1 month ago • 0 comments

The recent pull request [#2012](https://github.com/pytorch/torchtitan/pull/2012/) introduces a dry run mode for TorchTitan. However, the current implementation restricts the dry run functionality to the configuration system only. This limitation means that other components, such as the Trainer.__init__() method, are not covered by the dry run mode.

To enhance the utility of dry run mode, we should consider leveraging the fake PG (Process Group) mode. By doing so, the dry run mode can be extended to encompass the entire Trainer.__init__() process, allowing for more comprehensive validation and testing without requiring a full environment setup.

fegin avatar Nov 14 '25 17:11 fegin