open_clip
open_clip copied to clipboard
OOM with batch size 1 when with ViT-bigG on 40GB GPU
Similarly to https://github.com/mlfoundations/open_clip/issues/261, getting OOM with batch size 1 on 40GB GPU with ViT-G.
Weird. I once tested ViT-g-14 on RTX3090 (10G) and it could work, could refer to this, Maybe you could try multiple machines.
sorry I mean bigG
not g
Sorry for misunderstand
I think we've got two 'easy' options right now, DeepSpeed Zero (PR for this #264 might be worth testing) or PyTorch native FSDP. Talking w/ someone close to TPUs & PyTorch XLA recently, and they were stronly recommending giving FSDP a try for large scale runs (there's both an XLA specific varaint and normal PyTorch one).
Going full tensor parallelism is more work and I feel things are about to change w/ upcoming native PyTorch features (compilation w/ annotations for parallelism) such that needing to do it Megatron style will be a thing of the past.
seems like progress is being made with FSDP and also we think the OOM was because of model size + activations