blog Hiw to load CLIP ViT into encoder decoder captioning model?

Hiw to load CLIP ViT into encoder decoder captioning model?

Open christophschuhmann opened this issue 3 years ago • 0 comments

Hi, I tried to load the CLIP ViT L image encoder as encoder for your encoder-decoder-captioning model.

But it gives me some error, because it does not understand that the CLIP ViT is somehow wrapped into the overall CLIP model. https://colab.research.google.com/drive/1kzY6UGi1cg0YyLxfgIRUqtPZPy-xTo1w?usp=sharing

Can you help me fix it? :)

I have several A100s on which i would love to train it with a bigger GPT (like J or T0), but I am not so proficient with the HF Transformers lib. :)

Feb 06 '22 09:02 christophschuhmann

blog blog copied to clipboard

Hiw to load CLIP ViT into encoder decoder captioning model?

blog
blog copied to clipboard