Any plan to add Llama 1B and/or 3B models ?
Wondering if there is any plan to add the 1B and/or 3B models to the TorchTitan set of example models ? It is probably fairly straight forward to do that , if I am not missing anything, Another toml file and adds at a few places. The optimizer and lr_scheduler section may requires some trial and error.
I think we can start with adding the configs to https://github.com/pytorch/torchtitan/blob/main/torchtitan/models/llama/init.py#L29 Please submit PRs if you are interested in helping.
In terms of toml file, I think it'll be lower priority because
- FSDP would be enough
- we currently don't have the bandwidth to test/optimize performance
Yes, FSDPv2 should be enough for both. Would be glad to do a PR to just functionally enable pretraining of 1B and 3B if you are willing to review and merge.
Let's first add the 1B and 3B configs to https://github.com/pytorch/torchtitan/blob/main/torchtitan/models/llama/init.py#L29 but maybe not the toml files, as they require comprehensive testing.
Crated the PR - https://github.com/pytorch/torchtitan/pull/1040