Tim Meinhardt
Tim Meinhardt
The additional class is added for background prediction in the original DETR formulation. This means also including the class in the loss. However, when running with focal loss, i.e., in...
This was necessary to apply the a spatiotemporal encoding of the input pixels. [VisTR](https://github.com/Epiphqny/VisTR) did something similar for Video Instance Segmentation and increased their hidden size to 384. The spatiotemporal...
The training and inference are not the same but this is very common for MOT methods. Are you referring to a specific aspect?
Yes, the same model performs detection and tracking all within the attention layers of the decoder. Hence, both tasks are jointly end-to-end trainable.
Can u send the train and eval commands that you are executing?
These look fine. And what results are you trying to obtain? In this issue you seem to be trying to reproduce the training set numbers `https://github.com/timmeinhardt/trackformer/issues/46#issuecomment-1229689775`. The code is non-deterministic...
Did you retrain the model or use our pretrained model file? And what MOTA score did u obtain? something aroung 73 ?
There is some noise w.r.t. to the final scores but it should not be 4 points as it is for your private detection results. Did you train separate models for...
Did you submit the results from our pretrained models and obtained the same numbers? Just to be sure it is not related to the evaluation part of the pipeline. How...
These changes can definitely make a difference and you should mention any changes you made when reporting an issue. I would not touch the max_size parameters to fit the training...