Camera-based-Person-ReID icon indicating copy to clipboard operation
Camera-based-Person-ReID copied to clipboard

Why momentum is set to None?

Open HeliosZhao opened this issue 4 years ago • 7 comments

Hi,

First and foremost, Thanks for your code!

As shown in your code, you set the Momentum of BN to None. While in the testing stage, it means that :

running mean = mean of the last mini-bath running var = var of the last mini-bath

So I wonder why the number of mini-batch influences your results.

I think this is a random problem that if you choose the best mini-batch, you will get the best results, even if you only choose one mini-batch to calculate the running mean and var of the test camera.

HeliosZhao avatar Sep 09 '20 07:09 HeliosZhao

Hi. Thanks for your question. Momentum = None is not equivalent to Momentum = 0.0. Momentum = None is a unique feature of PyTorch, it calculates the cumulative moving average. Please check https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html for more details.

automan000 avatar Sep 09 '20 07:09 automan000

Thanks a lot, I find it.

momentum – the value used for the running_mean and running_var computation. Can be set to None for cumulative moving average (i.e. simple average)

So have you tried different momentum like 0.1?

HeliosZhao avatar Sep 09 '20 07:09 HeliosZhao

I did not test momentum=0.1. I think the simple average is a better choice since it is robust to the order of mini-batch, although it is slightly different from what is done in the training stage. Please feel free to share your results. Thanks a lot.

automan000 avatar Sep 09 '20 07:09 automan000

Hi, I tried momentum=0.1 and it performed bad. Momentum=False is a better choice.

HeliosZhao avatar Sep 13 '20 01:09 HeliosZhao

Hi, Thanks for the update. FYI, to reproduce the results in our paper, each experiment should be conducted 10 times. I will leave this issue open so that more people can see your results.

automan000 avatar Sep 13 '20 03:09 automan000

Hi, Thanks for the update. FYI, to reproduce the results in our paper, each experiment should be conducted 10 times. I will leave this issue open so that more people can see your results.

As you declared in another issue, you used momentum = None when training. I tired momentum = 0.1/None and found that when using momentum = 0.1, the cross-dataset performance is a lot better than that of momentum=None, if the model is not updated when testing. However, if the model is updated when testing, the result is opposite. Why?

justopit avatar Dec 27 '20 07:12 justopit

@justopit Please give more information about your training sets and testing sets. For Market, Duke, and MSMT, I have the same results as @HeliosZhao: momentum=None is better than momentum=0.1. This post may also provide you useful information.

Meanwhile, please report the average results of 10 times of repeated experiments.

automan000 avatar Dec 27 '20 12:12 automan000