Anas Awadalla
Anas Awadalla
Got it. This is currently not an option but definitely should be! I will open an issue (feel free to contribute or if not I can do this next week)....
@Soonhwan-Kwon The issue here is that you are adding a cross attention layer after every layer in llama 7B. I am not sure what the total number of parameters is...
Hello @itzsid! For all the models we released, we trained on 120M samples from LAION and 60M from mmc4. How many samples have you trained your version on? What is...
We apply smoothing to the loss curve in the paper so these loss plots look fine to me! Is that 10M samples of LAION and 5M samples of MMC4 then?...
Great! We used ddp with 80GB A100s for the 9B model. You should be able to train with higher batch sizes on the 40GB ones using our fsdp implementation. You...
So sorry for the late reply @itzsid! I noticed that there was a typo in the mmc4 forward pass. I fixed it at #250 and I anticipate this is what...
Hmm no we don't run into these. Just to confirm you are using torch 2.0.1?
 This is how downstream validation performance changes for COCO and VQAv2 for the 9B model. Our experience with VQA performance is that it...
Ah ok that could be the reason because we do use the full set. Especially since you do hit ~37 and assuming this is zero-shot this would match what we...
Thanks for the contribution @ajtejankar! I assigned @jpgard, our expert on classification tasks :).