Jong Wook Kim

Results 86 comments of Jong Wook Kim

For contrastive learning, you would be comparing all possible pairs of text and images (6x6=36 pairs, in your case), not just 6 pairs that happen to have the same index....

What vinson2233 suggested in the comment is simpler, and it might be easier to do so because you'd be able to keep using the same loss formulation, as long as...

I haven't tried, but it appears that you would need to do the inverse of what's done in: https://github.com/mlfoundations/open_clip/blob/74a72f3a4829656a9cfd8bae02253e2d28ab05d1/src/open_clip/model.py#L341-L391 In order to get a state_dict object compatible with this repo.

안녕하세요! I'd suspect if the class labels were inadvertently shuffled or mismatched. To verify, it'll be helpful to inspect the classification results for the individual images. As in the [example...

These both worked well for us: - PyTorch's stock LBFGS: https://pytorch.org/docs/stable/generated/torch.optim.LBFGS.html - A third-party implementation that has more features: https://github.com/hjmshi/PyTorch-LBFGS

Hi, thank you for bringing attention to the earthquake in Syria and Turkey. We deeply sympathize with the people affected by this tragedy, but I'll close this PR for now....

Not out of the box but there has been some work using CLIP to perform object detection/segmentation, e.g. #82

Yes. `argmax` selects the largest value in the input which is the EOT token. Because of the autoregressive mask, SOT (or CLS for the same purpose) at the beginning position...

Hi! I realized I fixed the same issue in #1033 without reviewing this PR. Sorry! Please feel free to reopen if I missed anything in that fix.

Hi, it's our bad that we didn't properly specify the dependency versions. Could you try with `transformer==2.9.1` and see if that loads properly?