dinov2
dinov2 copied to clipboard
High resolution image result with NaN features
Hello,
I'm having an issue with Dinov2 while trying to use it with high-resolution images like the one available at this link. The problem is that the features returned by the model contain NaN values. This issue occurs with all four available models and is consistently present for images around the same size.
I would like to know if you have any ideas about what could be causing this problem. Here's an minimal example:
import torch
import numpy as np
import torchvision.transforms as T
from PIL import Image
import hubconf
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
dino = hubconf.dinov2_vits14().to(device) # Same issue with larger model
img = Image.open('4k.png')
pw, ph = np.array(img.size) // 14
transform = T.Compose([
T.Resize((14 * ph, 14 * pw), interpolation=T.InterpolationMode.BICUBIC),
T.ToTensor(),
T.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])
tensor = transform(img)[:3].unsqueeze(0).to(device)
with torch.no_grad():
features = dino.forward_features(tensor)['x_norm_patchtokens'][0]
print(features) # NaN