Redamancy
Redamancy
Hello, I have the same confusion as you. I don’t know what is the role of `torch.gather()` after `torch.cat([txt_emb, img_emb], dim=1)`. Have you found the answer to this question?
Hello, I have encountered a problem with you on 3090, and Hope my method can help you. ①You should install a newer CUDA in your container than CUDA10.0. I installed...
> I used 8 A100 with 40G memory each, and the training takes 3-4 days on the 4M dataset. You may want to try fp16 training or gradient checkpointing techniques...