torchtitan icon indicating copy to clipboard operation
torchtitan copied to clipboard

A PyTorch native library for large-scale model training

Results 362 torchtitan issues
Sort by recently updated
recently updated
newest added

Would like to discuss adding a 'dry_run' flag as a general option. What is does: if user adds --dry_run, then the config file specified is still run with everything the...

enhancement

In parallelism/__init__.py, build mesh uses zip with the kwarg strict=True. ~~~ for d, name in zip( [self.dp, self.sp, self.pp], ["dp", "sp", "pp"], strict=True ): ~~~ This is apparently 3.10+ keyword,...

documentation

Currently torch compiling the default llama model will generate a warning re: being unable to lower complex numbers. ``` torch/_inductor/lowering.py:1639: UserWarning: Torchinductor does not support code generation for complex operators....

bug

The tentative tests we could add: 1. test the llama debug model init and forward/backward works 2. test checkpoint save/load works 3. metrics logging test (metrics to be added)

better_engineering

showcase context parallelism here once that feature is ready.

enhancement