How do you actually use this?
Calling model(img) on an arbitrary image resized to 256, and unsqueezed to give it the correct dimension (1,3,256,256) does not actually work. What else are you supposed to do to the image before giving it to the model for inference? Very frustrating.
It should work even if the image is of an arbitrary dimension, as long as the edge dimensions are multiple of 32.
Thanks for replying.
I was able to determine that my error was being caused by the tensor not being sent to GPU correctly (and it somehow ended up being channels last instead of channels first). My apologies. I've been having a tough week trying to implement code from papers!
I am specifically trying to use your Places365 pretrained model. Should I assume that your class labels are the same as what is listed here? https://github.com/CSAILVision/places365/blob/master/categories_places365.txt
Thanks again.
OK, real problem, sorry for the doubletap: This made me think I was crazy, but using your pretrained Places365 weights, trying to do inference using the example given here, I get the exact same results every time no matter what input is given (I checked three times to make sure that these were different inputs after my preprocessing - and they were)
These are those results:
torch.return_types.topk( values=tensor([[2.0731, 1.9153, 1.7019, 1.5919, 1.5876]], device='cuda:0', grad_fn=<TopkBackward0>), indices=tensor([[ 12, 67, 270, 103, 317]], device='cuda:0'))
I even tried reloading the model into a new variable, but same thing.
Is the syntax prediction = model(img) incorrect? Or is something else going on?
hello, I'm having the same problem as yours right now. Have you got a solution yet?
can you check with other pre-trained weights and see if the issues persist.