blog
blog copied to clipboard
Hiw to load CLIP ViT into encoder decoder captioning model?
Hi, I tried to load the CLIP ViT L image encoder as encoder for your encoder-decoder-captioning model.
But it gives me some error, because it does not understand that the CLIP ViT is somehow wrapped into the overall CLIP model. https://colab.research.google.com/drive/1kzY6UGi1cg0YyLxfgIRUqtPZPy-xTo1w?usp=sharing
Can you help me fix it? :)
I have several A100s on which i would love to train it with a bigger GPT (like J or T0), but I am not so proficient with the HF Transformers lib. :)