merlin How can I get the clip-vit-large-patch14-448

Hello, Your project is interesting. But the link you gave in the readme is for clip-vit-large-patch14-224, and I can't find clip-vit-large-patch14-448 on the huggingface, can you updata the link for the clip-vit-large-patch14-448?

Jul 11 '24 05:07 Aurorana

Thanks for your attention. There is no original clip-vit-large-patch14-448 on the hugging face. We employed a positional embedding interpolation to adapt the original 224x clip-vit to support an input resolution of 448.

Jul 16 '24 06:07 Ahnsun

Thank you very much for the information. I have a question: do we need to implement the positional embedding interpolation ourselves in order to adapt the original clip-vit model, which supports a 224x input, to support an input resolution of 448x? Thank you for your response!

Oct 02 '24 02:10 Everyth1ng-kyh

In the paper, it is mentioned that all modules are trained. Does it include the CLIP model? If so, could you please provide a fine-tuned CLIP model? Unless, it is difficult to reproduce the results. Thanks in advance!

Oct 21 '24 11:10 ryohachiuma