AbSViT
AbSViT copied to clipboard
questions about top_down_transform
Hi, thanks a lot for your great work, and about top_down_transform I have some questions.
Here, why we use top_down_transform to multiply with masked_x again, because we have already got the selected feature.
top_down_transform = prompt[..., None] @ prompt[..., None].transpose(-1, -2) x = x @ top_down_transform * 5
Hi, that's a good question. This part is for selecting the relevant features on the channel dimension while the previous selection is on the spatial dimension. We find selecting on both dimensions can enhance the effect of top down attention.