Owen O'Connor
Owen O'Connor
Hi, Nice work. I see that you use the highest resolution backbone feature map and encoder feature map to generate the pixel embedding map. Did you try including other feature...
Are you able to visualize encoder-decoder multi-head attention weights with the deformable attention similar to what was done in the original DETR paper? I know the Deformable DETR paper was...
Hi, really nice work! I am curious how the transformer is able to differentiate between the track and object queries. I undestand that you use TAN to update track queries...