vision icon indicating copy to clipboard operation
vision copied to clipboard

Regarding IMAGENET1K_V1 and IMAGENET1K_V2 weights

Open asusdisciple opened this issue 10 months ago • 0 comments

🐛 Describe the bug

I found a very strange "bug" while I was trying to find similiar instances in a vector database of pictures. The model I used is ResNet50. The problem occurs only when using the IMAGENET1K_V2 weights, but does not appear when using the legacy V1 weights (referring to https://pytorch.org/blog/how-to-train-state-of-the-art-models-using-torchvision-latest-primitives/).

When I calculate the cosine similarity with V1 weights for two almost identical pictures I get values > 0.95, however when I use V2 weights with the same pictures I get values < 0.7. In layman terms with V2 identical pictures are not recognized as such anymore. I gave you two example pictures below and the code to reproduce the problem. Does somebody have a concise explanation for this behaviour?

When you increase the size in your transform.resize((x, y)) the problem gradually begins to vanish, however this is not really a good solution since it produces overhead during inference.

Would be happy for any insights on this topic :)

from torchvision import models
from torchvision.models import ResNet50_Weights
import torchvision.io
from torch import nn
import numpy as np
from numpy.linalg import norm

class Identity(nn.Module):
    def __init__(self):
        super(Identity, self).__init__()

    def forward(self, x):
        return x

# Get weights
weights = ResNet50_Weights.IMAGENET1K_V1
preprocess = weights.transforms()

model = models.resnet50(weights=ResNet50_Weights.IMAGENET1K_V1).to("cuda:0")
model.fc = Identity()

a = model(preprocess(torchvision.io.read_image("/raid/..../datasets/lion/lion_ori_small.jpg").unsqueeze(dim=0).to("cuda:0"))).cpu().detach().numpy().squeeze()
b = model(preprocess(torchvision.io.read_image("/raid/.../datasets/lion/lion_fake_small.jpg").unsqueeze(dim=0).to("cuda:0"))).cpu().detach().numpy().squeeze()
cosine = np.dot(a,b)/(norm(a)*norm(b))

lion_fake lion_ori

Versions

torchvision 0.19

asusdisciple avatar Apr 17 '24 09:04 asusdisciple