stereo-transformer
stereo-transformer copied to clipboard
about the dimension of relative position encoding
Hi, nice work! I find that the 1d relative position encoding is of dimension 2W-1? Why it is not W? And I also wonder if this makes sttr unable to handle input of arbitrary size, e.g. the image is large that 2W-1?
hello @btwbtm
Given a sequence of W, the total signed relative positions will be 2W-1. For example, if you have W=3, the relative positions will be -2, -1, 0, 1, 2
, making it 2*3-1=5.
The W is the width of the image size. Therefore, STTR actually is able to handle arbitrary image size.