FeatUp
FeatUp copied to clipboard
Implicit upsampler with DINOv2 at non-square image size
Hi, thank you for sharing the great work.
The code looks like it can handle non-square images for DINOv2 input. However, I encountered an issue in downsamplers.py
which I resolved by modifying the following code:
patches = torch.nn.Unfold(self.kernel_size, stride=stride)(inputs) \
.reshape(
(b, self.in_dim, self.kernel_size * self.kernel_size, self.final_size, self.final_size * int(w / h))) \
.permute(0, 3, 4, 2, 1)
to:
(b, self.in_dim, self.kernel_size * self.kernel_size, self.final_size, int(self.final_size * (w / h)))) \
With this change, train_implicit_sampler.py
worked for DINOv2 input sizes like 280x392.
Additionally, 392 seems to be the largest input size that this Unforld() & reshape work (at least for kernel_size=29). If there's ways to go beyond 392, please share.
Thanks,