tensor & positional encoding on different devices in Summer when using 3D positional encodings with batch size = 1
Hi,
I'm using 3D positional encodings for pytorch in a (shifted window) transformer model. My model was trained with a batch size of 8. In testing, the positional encodings work fine with a batch size > 1. With batch_size = 1, however, the positional encoding is on device 'cpu' while my patch embedding is on device 'cuda:0', so the positional encoding can't be added to the patch embedding in the Summer class.
I naively fixed the issue by replacing: return tensor + penc with return tensor + penc.to(tensor.device) in torch_encodings.py (line 213). Is the idea of just forcing the positional encoding to be on the same device as the embedding valid, and should be added in the code maybe?
Best regards, Tjade
Hi Tjade,
Thanks for discovering this! Yes, I think this is valid. If you want, feel free to send me a PR for this. I think it would be valid to add this for all of 1,2 and 3D methods.