CLIP
CLIP copied to clipboard
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Not really an issue, I just want to share my training code since some people still have some difficulties to write the training code. Just modify the code to suit...
`Swin transformer` achieves higher accuracy in model size and computational amount similar to `ViT`. I think that using clip's method and dataset will show higher performance. - ViT-B/16, 384x384, 86M,...
In [this paper](https://arxiv.org/pdf/2103.00020.pdf) there is only a vague description about the WIT dataset: > ... we constructed a new dataset of 400 million (image, > text) pairs collected form a...
Can I use a different method to tokenize the input prompt and still get a proper prediction or must I use the `clip.tokenize(str)` method? I'm wondering if I can, for...
Hi, Thanks for providing this really convenient package to use the CLIP model! I've come across a problem with `build_model` when trying to reconstruct the model from a state_dict on...
Hi, I want to train the model with my dataset. But there is a little question , would you please help me. How to restart the training from checkpoint? How...
Thank you for your amazing paper, I am trying to evaluate CLIP with RN50x16 on ImageNet, output = model.encode_image(test_image) but get error: File "", line 1, in output = model.encode_image(test_image)...
Hi, I have trained a clip model using image and its caption. Now, i want to evaluate the performance of the model like Precision, Recall, F1 Score. How I can...
Thank you for your amazing paper, I am trying to evaluate CLIP with a linear-probe on ImageNet, but wish to save some of the compute needed for the sweep required...
Hi, Thanks for the great work. Due to the needs of specific tasks, i want to train CLIP from scratch without using BPE coding and the length limit of 77,...