FeatUp icon indicating copy to clipboard operation
FeatUp copied to clipboard

Implicit upsampler with DINOv2 at non-square image size

Open ttk-kstn opened this issue 3 months ago • 1 comments

Hi, thank you for sharing the great work.

The code looks like it can handle non-square images for DINOv2 input. However, I encountered an issue in downsamplers.py which I resolved by modifying the following code:

        patches = torch.nn.Unfold(self.kernel_size, stride=stride)(inputs) \
            .reshape(
            (b, self.in_dim, self.kernel_size * self.kernel_size, self.final_size, self.final_size * int(w / h))) \
            .permute(0, 3, 4, 2, 1)

to:

            (b, self.in_dim, self.kernel_size * self.kernel_size, self.final_size, int(self.final_size * (w / h)))) \

With this change, train_implicit_sampler.py worked for DINOv2 input sizes like 280x392.

Additionally, 392 seems to be the largest input size that this Unforld() & reshape work (at least for kernel_size=29). If there's ways to go beyond 392, please share.

Thanks,

ttk-kstn avatar Apr 02 '24 02:04 ttk-kstn