RegionProxy
RegionProxy copied to clipboard
How does h,w in the paper and F.unfold()function in the code work?
1、 About h,w The sentence "(Hh) × (W w) matches the size of the output segmentation map and (h, w) is the relative stride of the initial token gird" in the paper indicate that h,w is the downsample stride of segmentation map, but when I reading the code, I feel confused how it works, throught rerange the token_logits and matrix multiplication we get the final segmentation map,which is as large as the input image. So why do you set the extra parameter h and w, and how do h,w relate with stride?
2、About F.unfold()
Official Implement Code
token_logits = F.unfold(token_logits, kernel_size=3, padding=1).reshape(B, -1, 9, H, W) # (B, C, 9, H, W)
pseudocode in the paper
# get neighbors for each cell
y = rar(y, "B N K -> B K H W")
nb = im2col(y, kernel_size=3, padding=1)
nb = rar(nb, "B (K n) (H W) -> B H W n K")
The other is what does F.unfold() do in the code ,in the paper ,you show the process of proxy head using pseudocode,and say im2col( i.e. F.unfold() ) is using to get neighbors for each cell, I can not understand this well ,too.
Looking forward to your reply!!! Thank you ~~~