stereo-transformer icon indicating copy to clipboard operation
stereo-transformer copied to clipboard

about the dimension of relative position encoding

Open XiaoyuShi97 opened this issue 3 years ago • 1 comments

Hi, nice work! I find that the 1d relative position encoding is of dimension 2W-1? Why it is not W? And I also wonder if this makes sttr unable to handle input of arbitrary size, e.g. the image is large that 2W-1?

XiaoyuShi97 avatar Oct 18 '21 04:10 XiaoyuShi97

hello @btwbtm

Given a sequence of W, the total signed relative positions will be 2W-1. For example, if you have W=3, the relative positions will be -2, -1, 0, 1, 2, making it 2*3-1=5.

The W is the width of the image size. Therefore, STTR actually is able to handle arbitrary image size.

mli0603 avatar Nov 17 '21 15:11 mli0603