upsample function leads to tensor size mismatch for certain input image sizes when spatial=True
Currently, the upsample function is as follows:
def upsample(in_tens, out_HW=(64,64)): # assumes scale factor is same for H and W
in_H, in_W = in_tens.shape[2], in_tens.shape[3]
scale_factor_H, scale_factor_W = 1.*out_HW[0]/in_H, 1.*out_HW[1]/in_W
return nn.Upsample(scale_factor=(scale_factor_H, scale_factor_W), mode='bilinear', align_corners=False)(in_tens)
This ends up failing in the case where the input images being compared are of resolution 800x600. When this is the case, one of the layers passed in as in_tens has shape (1, 1, 149, 199). As a result, in_H * scale_factor_H = 600.0000000000001 and in_W * scale_factor_W = 799.9999999999999. The result of the Upsample is an output tensor of size (1,1,600,799), which leads to an exception when it is added to other tensors of size (1,1,600,800).
Instead of computing the scale_factor, a more robust solution is to just set the size parameter directly:
return nn.Upsample(size=out_HW, mode='bilinear', align_corners=False)(in_tens)
This might also be the cause of this specific comment: https://github.com/richzhang/PerceptualSimilarity/issues/45#issuecomment-666929172
Thanks for pointing it out. I updated it!
Just wanted to mention that this bug is still present in the pip version of the lpips library (when doing pip install lpips, the upsample function hasn't been changed).
I updated the pip, so hopefully it should work now. Thanks!