Thank you for your amazing paper,
I am trying to evaluate CLIP with RN50x16 on ImageNet,
output = model.encode_image(test_image)
but get error:
File "", line 1, in <cell line: 1>
output = model.encode_image(test_image)
File "/home/user/anaconda3/envs/yolov5_4/lib/python3.8/site-packages/clip/model.py", line 337, in encode_image
return self.visual(image.type(self.dtype))
File "/home/user/anaconda3/envs/yolov5_4/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user/anaconda3/envs/yolov5_4/lib/python3.8/site-packages/clip/model.py", line 148, in forward
x = self.attnpool(x)
File "/home/user/anaconda3/envs/yolov5_4/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/user/anaconda3/envs/yolov5_4/lib/python3.8/site-packages/clip/model.py", line 69, in forward
x = x + self.positional_embedding[:, None, :].to(x.dtype) # (HW+1)NC
RuntimeError: The size of tensor a (50) must match the size of tensor b (145) at non-singleton dimension 0
Thanks
@euminds The input_resolution is 384384, not 224224.