keras_reid icon indicating copy to clipboard operation
keras_reid copied to clipboard

The mlsl loss stays at 0.6

Open Hydraz320 opened this issue 7 years ago • 7 comments

Sorry to bother you, but I finished loaddata function and used mlsl loss func, however, flatten_loss which is mlsl loss was somehow stay at alpha which is set as 0.6 in origin codes. I think this means positive equals to negative, so that the loss choose to be a larger value which becomes alpha. Is that normal? And the overall loss was not decreasing. Thanks again!

Hydraz320 avatar Jan 10 '18 06:01 Hydraz320

same problem, in Trihard loss, I used K.gradients to check every layer, and I found this

    dis_mat = K.sum(K.square(delta),axis = 2)
    dis_mat = K.sqrt(dis_mat) + 1e-8 

will cause gradients become Nan, you can try this:

    dis_mat = K.sum(K.square(delta), axis = 2) + K.epsilon()
    dis_mat = K.sqrt(dis_mat)

In my case, the model converge normally

kardoszc avatar Mar 18 '18 07:03 kardoszc

It does work,thanks @kardoszc

shen-ee avatar Apr 02 '18 11:04 shen-ee

The network initialization has a great influence on MSML, which means an inappropriate initialization may results in NAN. I always trained the model several epochs with softmax loss to initialize the model.

@kardoszc gived us a good solution. Really thanks!

michuanhaohao avatar Apr 24 '18 05:04 michuanhaohao

Hi @michuanhaohao, have you had any success training with MSML from scratch (instead of combining MSML with another loss)? If so, I would be curious to know the hyper-parameters that lead to convergence.

ergysr avatar Apr 30 '18 18:04 ergysr

@ergysr It may depend on the datasets. Without another/softmax loss, i successfully trained the model on Market1501, but failed on CUHK03. I thought that CUHK03 has two images for each person IDs and I set K=4 for MSML. So there were repeated images in a batch.

michuanhaohao avatar May 01 '18 06:05 michuanhaohao

Hi, @michuanhaohao, may I ask how is your msml performance(mAP score) on Market1501? I've tried but can't achieve any good result if only training with msml. I used Resnet v1 50(pre-trained on Imagenet) as backbone model. Some of my hyper-parameters and implementation details are: batch_K = 4, batch_P = 18 (each mini-batch contains 18 pid and 4 images per pid) lr = 1e-3 at first, then exp decay rate=0.1 for every 10000 steps after step 10000 total training steps is 30000 data augment with flip and random crop to size 256x128 model architecture is resnet_v1_50 -> fc1024 -> fc128(with l2_norm) as embedding layer Thank you.

ca-joe-yang avatar Jul 14 '18 17:07 ca-joe-yang

I have the same problem, and it is very confusing. I used the imagenet pre-train weights, and sometimes 1-5 epochs can reach a good result, and sometimes nothing...

NobleYd avatar Aug 16 '18 02:08 NobleYd