torchtitan
torchtitan copied to clipboard
Add Pipeline Parallel (and 2D PP+FSDP) support
Stack from ghstack (oldest at bottom):
- #340
- #337
- -> #318
runs PP+DP and PP+TP without issue, runs PP+TP+DP with decreasing loss, but fails DCP save
Supports only simple schedules currently, gpipe and 1f1b.
Ads cmdline/toml arg for specifiying split points, in a unified way between tracer or manual frontend.
e.g. user can specifiy "layers.2,layers.4" as split points.
Currently uses manual frontend by default, but allows specifying tracer frontend. Tracer frontend requires working around additional compatibility limitations, indicated by raising assertions, and is not ready for wider use yet.