vision
vision copied to clipboard
Regarding IMAGENET1K_V1 and IMAGENET1K_V2 weights
🐛 Describe the bug
I found a very strange "bug" while I was trying to find similiar instances in a vector database of pictures. The model I used is ResNet50. The problem occurs only when using the IMAGENET1K_V2
weights, but does not appear when using the legacy V1
weights (referring to https://pytorch.org/blog/how-to-train-state-of-the-art-models-using-torchvision-latest-primitives/).
When I calculate the cosine similarity with V1
weights for two almost identical pictures I get values > 0.95
, however when I use V2
weights with the same pictures I get values < 0.7
. In layman terms with V2
identical pictures are not recognized as such anymore. I gave you two example pictures below and the code to reproduce the problem. Does somebody have a concise explanation for this behaviour?
When you increase the size in your transform.resize((x, y))
the problem gradually begins to vanish, however this is not really a good solution since it produces overhead during inference.
Would be happy for any insights on this topic :)
from torchvision import models
from torchvision.models import ResNet50_Weights
import torchvision.io
from torch import nn
import numpy as np
from numpy.linalg import norm
class Identity(nn.Module):
def __init__(self):
super(Identity, self).__init__()
def forward(self, x):
return x
# Get weights
weights = ResNet50_Weights.IMAGENET1K_V1
preprocess = weights.transforms()
model = models.resnet50(weights=ResNet50_Weights.IMAGENET1K_V1).to("cuda:0")
model.fc = Identity()
a = model(preprocess(torchvision.io.read_image("/raid/..../datasets/lion/lion_ori_small.jpg").unsqueeze(dim=0).to("cuda:0"))).cpu().detach().numpy().squeeze()
b = model(preprocess(torchvision.io.read_image("/raid/.../datasets/lion/lion_fake_small.jpg").unsqueeze(dim=0).to("cuda:0"))).cpu().detach().numpy().squeeze()
cosine = np.dot(a,b)/(norm(a)*norm(b))
Versions
torchvision 0.19