dinov2 icon indicating copy to clipboard operation
dinov2 copied to clipboard

High resolution image result with NaN features

Open TurtleSmoke opened this issue 1 year ago • 5 comments

Hello,

I'm having an issue with Dinov2 while trying to use it with high-resolution images like the one available at this link. The problem is that the features returned by the model contain NaN values. This issue occurs with all four available models and is consistently present for images around the same size.

I would like to know if you have any ideas about what could be causing this problem. Here's an minimal example:

import torch
import numpy as np
import torchvision.transforms as T
from PIL import Image
import hubconf

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
dino = hubconf.dinov2_vits14().to(device)  # Same issue with larger model
img = Image.open('4k.png')
pw, ph = np.array(img.size) // 14

transform = T.Compose([
    T.Resize((14 * ph, 14 * pw), interpolation=T.InterpolationMode.BICUBIC),
    T.ToTensor(),
    T.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])

tensor = transform(img)[:3].unsqueeze(0).to(device)
with torch.no_grad():
    features = dino.forward_features(tensor)['x_norm_patchtokens'][0]

print(features)  # NaN

TurtleSmoke avatar Apr 20 '23 09:04 TurtleSmoke