FeatUp
FeatUp copied to clipboard
DINOv2 patch size is written as 16 instead of 14 ?
Hi,
First of all thanks for your job.
I noticed that according to the code, the patch_size from DINOv2 is expected to be 16 : https://github.com/mhamilton723/FeatUp/blob/c04e4c19945ce3e98a5488be948c7cc1fdcdacc6/featup/featurizers/util.py#L27
However, to my knowledge, unlike DINO and conventional ViT, the pretrained DINOv2 model uses 14x14 patches , not 16x16 patches. It is explicitly explained here : https://github.com/facebookresearch/dinov2/blob/main/MODEL_CARD.md and in the DINOv2 paper as well. Could you clarify this ?
To me this seems a reliquate of work about the DINOv2Featurizer class, since some lines are commented, but in the end the patch size (and every other input parameter) is not used and the small DINOv2 is loaded directly from torchhub (and it uses 14x14 patches) torchhub.https://github.com/mhamilton723/FeatUp/blob/c04e4c19945ce3e98a5488be948c7cc1fdcdacc6/featup/featurizers/DINOv2.py#L433
Also, it would be very useful to have upsamplers for larger DINOv2 models and not only the small one (having base, large and giant variant would be nice :+1: ).
fixes this typo thank you and i will try to make some larger versions in the next few weeks!
fixes this typo thank you and i will try to make some larger versions in the next few weeks!
Could you please provide an estimated release date for the larger version of DINOv2? I'm eager to utilize the more advanced model.
Do you mean using bigger backbones like DinoV2 ViT L/G, I am also eager to use those models if that's the case.
Do you mean using bigger backbones like DinoV2 ViT L/G, I am also eager to use those models if that's the case.
Yes, I am interested in DINOv2 ViT L
Hello, I'm also using the DINOv2 model. I've noticed that the resolution of hr_feat after upsampling is still 16 times larger than lr_feat, not 14 times. I suspect error may come from the forward process of the class JUBStack in ./featup/upsamplers.py, where self.upsample is called four times, resulting in an upsampling of 2^4=16 times on the low-resolution image. Perhaps it can be solved by building a self.upsample_new of a different size to achieve a 14x magnification factor and then training for new pretarined weights? I really admire your work and look forward to adapting to other larger versions of the DINOv2.
Hello, I'm also using the DINOv2 model. I've noticed that the resolution of hr_feat after upsampling is still 16 times larger than lr_feat, not 14 times. I suspect error may come from the forward process of the class JUBStack in ./featup/upsamplers.py, where self.upsample is called four times, resulting in an upsampling of 2^4=16 times on the low-resolution image. Perhaps it can be solved by building a self.upsample_new of a different size to achieve a 14x magnification factor and then training for new pretarined weights? I really admire your work and look forward to adapting to other larger versions of the DINOv2.
I made the same observations. Currently, it seems only possible to upsample by a factor x2 with JBU , multiples times (so x2 x4 x8 x16). It's not possible to do x14, unless you first upsample to x16 and then downsample to x14 (with a bilinear interpolation for instance).
@mhamilton723 any plans on DINOv2 ViT-L version?
is there a checkpoint for DINOv2-Base?