AbSViT icon indicating copy to clipboard operation
AbSViT copied to clipboard

questions about top_down_transform

Open wefwefWEF2 opened this issue 1 year ago • 1 comments

Hi, thanks a lot for your great work, and about top_down_transform I have some questions.

Here, why we use top_down_transform to multiply with masked_x again, because we have already got the selected feature.

top_down_transform = prompt[..., None] @ prompt[..., None].transpose(-1, -2) x = x @ top_down_transform * 5

wefwefWEF2 avatar Jan 15 '24 16:01 wefwefWEF2

Hi, that's a good question. This part is for selecting the relevant features on the channel dimension while the previous selection is on the spatial dimension. We find selecting on both dimensions can enhance the effect of top down attention.

bfshi avatar Mar 21 '24 17:03 bfshi