CLIP
CLIP copied to clipboard
How do I fine-tune/train clip on mnist?
Does anyone know how to fine-tune clip on mnist? If I pass in a 32 batch of images and 10 unique labels, I don't know what the loss function would be. Because in Clip's approach for each image, there's a unique text-image pairing. Now for my scenario, each image has a unique label but I do not have a unique image for each label.