LAVIS
LAVIS copied to clipboard
What is the distribuetd framework when training the model named BLIP-2 ViT-G FlanT5xxl?
The "BLIP-2 ViT-G Flan-T5-XXL" has 12.1B parameters. Therefore, how do you load this model to cuda memory? Do you use some skills like model sharding or the framework like Deepspeed?