trackformer
trackformer copied to clipboard
settings of multi-frame and number of classes (20)
Hi @timmeinhardt , thanks for your great work!
After checking the code, I found that (1) the number of clases is set to 20 even only the person is tracked; (2) multi-frame attention is performed, but no discussion is provided in the paper or this repo.. Here come the questions: why you set the number of classes to 20? Does the multi-frame attention contributs to the perfomance gain a lot?
Thanks.
(1): Computing the focal loss for a single class introduces some noise which we found to be less if we increase the number of classes. The number 20 is a bit arbitrary here. (2): We mention the multi-frame attention in the implementation details. However, since it is not a key element of our contribution and how track queries work we did not provide an in-depth discussion. In particular, the identity preservation (IDF1 and ID switches) benefit a lot from this.
Hi @timmeinhardt ,
Thanks for your great works!
I found that the number of classes is added by 1 during the definition of classifier. https://github.com/timmeinhardt/trackformer/blob/d62d81023dbffb4a1820db39ce527b66df6d7b61/src/trackformer/models/detr.py#L37
At the post process, the last appended class is only removed at this line. But the computation of loss does not remove this appended class. https://github.com/timmeinhardt/trackformer/blob/d62d81023dbffb4a1820db39ce527b66df6d7b61/src/trackformer/models/detr.py#L476
I am curious why you do such settings. Does this have a influence on the tracking performance.
Another question is that the model is optimized with focal loss, which means that sigmoid is used to activate the predicted logits. However, in the post process, softmax is adopted to acitvate the logits and get the prediction scores. Is this a bug or designed by your experimental findings?
The additional class is added for background prediction in the original DETR formulation. This means also including the class in the loss. However, when running with focal loss, i.e., in the deformable DETR formulation, we do not add the additional class. See this line where we subtract from the number of classes for focal loss
https://github.com/timmeinhardt/trackformer/blob/d62d81023dbffb4a1820db39ce527b66df6d7b61/src/trackformer/models/init.py#L34
Your second question is also related to the difference between DETR and Deformable DETR. When running the latter with focal loss we do not apply a softmax in the post processing. See this module:
https://github.com/timmeinhardt/trackformer/blob/d62d81023dbffb4a1820db39ce527b66df6d7b61/src/trackformer/models/deformable_detr.py#L286