Philip Bontrager

Results 9 issues of Philip Bontrager

Currently our batch size is a local batch size. This means with a bs=4, if you launch on 4 gpus then each gpu gets 4 data points and your real...

best practice

#### Context What is the purpose of this PR? Is it to - [x] add a new feature - [ ] fix a bug - [ ] update tests and/or...

CLA Signed

#### Context What is the purpose of this PR? Is it to - [x] add a new feature - [ ] fix a bug - [ ] update tests and/or...

CLA Signed

# [RFC] Fusion Models **TLDR** - Fused Models are two+ pre-trained models joined together and further tuned to work as one model. This is the approach used for most SOTA...

CLA Signed

#### Context What is the purpose of this PR? Is it to - [x] add a new feature - [ ] fix a bug - [ ] update tests and/or...

CLA Signed

# [RFC] TransformerDecoderLayer Refactor Refactor TransformerDecoder so it can be used for multimodal architectures. **TLDR** - Replace TransformerDecoderLayer with TransformerSelfAttention and TransformerCrossAttention - Replace CausalSelfAttention with GroupedQueryAttention - Support legacy...

CLA Signed

In the HF Checkpointer, we warn the user that the adapter weights can't be converted to the PEFT format and will be converted to a torchtune format, but then we...

bug

Image size must be divisible by ViT patches in the CLIP encoder, 14.

CLA Signed

With the addition of multimodal and dpo, collation is getting more varied and complicated. Either depending on the recipe or the model and type of data in the model. To...