trackformer icon indicating copy to clipboard operation
trackformer copied to clipboard

I can not reproduce the results

Open loseevaya opened this issue 2 years ago • 9 comments

Hey, thanks for your excellent work! I train TrackFormer on your default setting (load from pretrained CrowdHuaman checkpoint) on the joint set of CrowdHuman and MOT17, and i get result on MOT17 about 74.0 MOTA, but when i submit it to the motchallenge,i only get 72.7 MOTA, i changed the batch_size to 1,and keep other parameters unchanged. Why is this happening?

loseevaya avatar Mar 17 '23 09:03 loseevaya

Cause you changed the batch size to 1. :) A different batch size means you must find new optimal learning rates and training epochs.

timmeinhardt avatar Mar 17 '23 12:03 timmeinhardt

Cause you changed the batch size to 1. :) A different batch size means you must find new optimal learning rates and training epochs.

What should I change the learning rates and epochs to when I set the batch_size =1?

quxu91 avatar Apr 04 '23 12:04 quxu91

Hey, thanks for your excellent work! I train TrackFormer on your default setting (load from pretrained CrowdHuaman checkpoint) on the joint set of CrowdHuman and MOT17, and i get result on MOT17 about 74.0 MOTA, but when i submit it to the motchallenge,i only get 72.7 MOTA, i changed the batch_size to 1,and keep other parameters unchanged. Why is this happening?

I met the same problem, have you reproduced the results? And what training rates and epochs had you set?

quxu91 avatar Apr 04 '23 13:04 quxu91

I do not know. You have to find new optimal LRs and epochs. A recommend starting point could be to half the learning rates just as you did with the batch size from 2 to 1. But there is no guarentee this will yield the same results. In fact, we tried working with batch_size=1 for a while but never achieved the same top performance as with batch_size=2.

timmeinhardt avatar Apr 04 '23 13:04 timmeinhardt

Thanks for your early reply, should I set all the LRs (includes lr, lr_backbone, lr_track, lr_linear_proj_mult in train.yaml)to half? And the weight_decay should I change? Why does it appear different when using different batch sizze ?

quxu91 avatar Apr 04 '23 13:04 quxu91

Only the learning rates not the multiplicators (lr_linear_proj_mult ). The weight decay can remain as it is.

What appears different with different batch sizes? You mean why do you have to set different LRs for different batch sizes?

timmeinhardt avatar Apr 04 '23 13:04 timmeinhardt

Only the learning rates not the multiplicators (lr_linear_proj_mult ). The weight decay can remain as it is.

What appears different with different batch sizes? You mean why do you have to set different LRs for different batch sizes?

Yes! As I set the batch_size to 1, are there any other parameters other than the learning rate aforementioned that could have an impact on the results?

quxu91 avatar Apr 04 '23 13:04 quxu91

Explaining the relation between batch size and learning rate goes beyond the support of this repository. :)

You might have to adjust the number of epochs. But again you will most likely not get the same results easily. This requires some potentially expensive hyperparameter tuning.

timmeinhardt avatar Apr 04 '23 13:04 timmeinhardt

I get it! Thanks for your enthusiastic answer anyway!

quxu91 avatar Apr 04 '23 13:04 quxu91