Stark icon indicating copy to clipboard operation
Stark copied to clipboard

the specific role of the decoder in transformer structure

Open ANdong-star opened this issue 2 years ago • 2 comments

Hi! You said that "In the encoder-decoder attention module, the target query can attend to all positions on the template and the search region features, thus learning robust representations for the final bounding box prediction." in your paper. How to understand that? It's really abstract for me. Thanks for your reply!

ANdong-star avatar Nov 10 '21 14:11 ANdong-star

@ANdong-star Hi, this process is quite similar to that in the DETR decoder. In DETR, 100 object queries interact with the image features output by the encoder. In STARK, one target query interacts with the joint template-search features to extract the target information. Finally the box prediction head integrate the output of the encoder and the decoder to predict the final box results.

MasterBin-IIAU avatar Nov 30 '21 01:11 MasterBin-IIAU

@ANdong-star Hi, this process is quite similar to that in the DETR decoder. In DETR, 100 object queries interact with the image features output by the encoder. In STARK, one target query interacts with the joint template-search features to extract the target information. Finally the box prediction head integrate the output of the encoder and the decoder to predict the final box results.

got it! thanks!

ANdong-star avatar Dec 02 '21 08:12 ANdong-star