Kartikay Khandelwal
Kartikay Khandelwal
#### Context As per title #### Changelog - Builder function + config #### Test plan - Trained for one epoch with the following loss - Training Speed  ...
## Context On a single device, our current Llama7B full fine-tune recipe either OOMs with the ```AdamW``` optimizer, or takes > 55GB with ```SGD```. Given the importance of single device...
Creating a single tracker for potential feature requests which are currently not on the roadmap. This will help with tracking and prioritization, and remove issues with no context and no...
@iseeyuan had a great suggestion that we should add information about expected run time to our different tutorials and docs. Otherwise it's unclear to a first time user on what...
On 6 GPUs this is taking ~30GB/device which doesn't seem right. This needs some debugging.
Make sure the RoPE embeddings and norms are being correctly computed when training with full bf16.
We make heavy use of builder functions for instantiating specific model architectures from generalized building blocks. For example, the [llama2 builder function](https://github.com/pytorch-labs/torchtune/blob/main/torchtune/models/llama2.py#L77) is used to stitch together the components needed...
Currently it's not clear to contributors that they'll need to run lint before submitting PRs. We need to make this more prominent and easier.
Using this issue to track the work we need to do to enable this. We'll capture our learnings, TODOs and PRs over here so everyone can follow along.
Imp Note: The tokenizer still needs some work, this will be a follow up PR. #### Context What is the purpose of this PR? Is it to - [x] add...