Philip Bontrager
Philip Bontrager
Currently our batch size is a local batch size. This means with a bs=4, if you launch on 4 gpus then each gpu gets 4 data points and your real...
#### Context What is the purpose of this PR? Is it to - [x] add a new feature - [ ] fix a bug - [ ] update tests and/or...
#### Context What is the purpose of this PR? Is it to - [x] add a new feature - [ ] fix a bug - [ ] update tests and/or...
# [RFC] Fusion Models **TLDR** - Fused Models are two+ pre-trained models joined together and further tuned to work as one model. This is the approach used for most SOTA...
#### Context What is the purpose of this PR? Is it to - [x] add a new feature - [ ] fix a bug - [ ] update tests and/or...
# [RFC] TransformerDecoderLayer Refactor Refactor TransformerDecoder so it can be used for multimodal architectures. **TLDR** - Replace TransformerDecoderLayer with TransformerSelfAttention and TransformerCrossAttention - Replace CausalSelfAttention with GroupedQueryAttention - Support legacy...
In the HF Checkpointer, we warn the user that the adapter weights can't be converted to the PEFT format and will be converted to a torchtune format, but then we...
Image size must be divisible by ViT patches in the CLIP encoder, 14.
With the addition of multimodal and dpo, collation is getting more varied and complicated. Either depending on the recipe or the model and type of data in the model. To...