trackformer Use of args.multi_frame

Hi @timmeinhardt , thanks so much for this great work!

While trying to reproduce the results for MOTS20, I noticed some differences between your DeformableDETR and the DETR implementations.

Could you explain the use of args.multi_frame_attention in the adjusted DeformableDETR? I'm wondering why it is not used in the DETR based model for mask tracking.

Is multi frame attention not necessary to utilise track queries in the model? I read section 4.2 in the paper, but I'm still a bit confused.

Oct 17 '22 10:10 tragians

We provide the MOTS20 results for the old model cause the deformable attention seemed to perform worse for segmentation. Multi-frame and multi-scale trainings were not part of the old model. However, there is no reason why multi-frame could not work for segmentation.

Oct 28 '22 18:10 timmeinhardt

Thank you very much for your detailed answer!

I have a follow up question on the slightly modified Transformer Class you introduce. I was wondering what the use of the parameter track_attention is and whether it was used during training.

Nov 25 '22 15:11 tragians

The track_attention is a legacy parameter and was not used during any of the trainings.

Nov 30 '22 13:11 timmeinhardt

Use of args.multi_frame_attention