torchtitan issues

add 'dry run' flag - one iter, no saving, as quick proof to check basic perf and verify that environ is ready to go

Would like to discuss adding a 'dry_run' flag as a general option. What is does: if user adds --dry_run, then the config file specified is still run with everything the...

lessw2020

enhancement

Python Zip and Strict = True is 3.10 only...fails on 3.9 with TypeError: zip() takes no keyword arguments

In parallelism/__init__.py, build mesh uses zip with the kwarg strict=True. ~~~ for d, name in zip( [self.dp, self.sp, self.pp], ["dp", "sp", "pp"], strict=True ): ~~~ This is apparently 3.10+ keyword,...

lessw2020

documentation

train scripts on internal cluster

wanchaol

Rope embeddings with complex number generates compile warning...possible options

Currently torch compiling the default llama model will generate a warning re: being unable to lower complex numbers. ``` torch/_inductor/lowering.py:1639: UserWarning: Torchinductor does not support code generation for complex operators....

lessw2020

bug