x-clip
x-clip copied to clipboard
A concise but complete implementation of CLIP with various experimental improvements from recent papers
PR for the distributed training setup.
Hi lucidrains, Try this and it will NaN within 100 steps (latest Github code). The loss looks fine before NaN. ``` import torch torch.backends.cudnn.allow_tf32 = True torch.backends.cuda.matmul.allow_tf32 = True torch.backends.cudnn.benchmark...
Hi nice work with x-clip. Hoping to play around with it and eventually combine it into your DALLE2 work. Currently having some trouble training on roughly 30k image-text pairs. Loss...
will start with 1. FILIP https://arxiv.org/abs/2111.07783 2. CLOOB https://arxiv.org/abs/2110.11316 3. https://arxiv.org/abs/2110.05208
Can we extract embeddings of size (say dim_text = 256, dim_image = 256) other than 512 from a pre-trained CLIP?
Loss goes in -ve with mock data
Dear Lucidrains, Thanks for your selfless contribution and outstanding work, it is beneficial. I'm an ML beginner, so my foundation is not solid. I have a question about x_clip. I'm...
Hi and thanks for all the work done in this repository! I noticed that the implementation of the CLS token in the Vision transformer, as well as the tokens used...