torchtitan Any plan to add Llama 1B and/or 3B models ?

Wondering if there is any plan to add the 1B and/or 3B models to the TorchTitan set of example models ? It is probably fairly straight forward to do that , if I am not missing anything, Another toml file and adds at a few places. The optimizer and lr_scheduler section may requires some trial and error.

Mar 28 '25 01:03 githubsgi

I think we can start with adding the configs to https://github.com/pytorch/torchtitan/blob/main/torchtitan/models/llama/init.py#L29 Please submit PRs if you are interested in helping.

In terms of toml file, I think it'll be lower priority because

FSDP would be enough
we currently don't have the bandwidth to test/optimize performance

Mar 28 '25 06:03 tianyu-l

Yes, FSDPv2 should be enough for both. Would be glad to do a PR to just functionally enable pretraining of 1B and 3B if you are willing to review and merge.

Mar 28 '25 17:03 githubsgi

Let's first add the 1B and 3B configs to https://github.com/pytorch/torchtitan/blob/main/torchtitan/models/llama/init.py#L29 but maybe not the toml files, as they require comprehensive testing.

Mar 28 '25 22:03 tianyu-l

Crated the PR - https://github.com/pytorch/torchtitan/pull/1040

Apr 01 '25 18:04 githubsgi