sphereface icon indicating copy to clipboard operation
sphereface copied to clipboard

how to change m

Open zjz5250 opened this issue 6 years ago • 16 comments

at the beginning,i use m as “type:SINGLE”,and the loss converge,acc can is about 0.98. but when i change the m as “type:QUADRUPLE”,the loss become larger and larger,and acc come down also。 so how to change the type correctly? hope the help

zjz5250 avatar Mar 20 '18 06:03 zjz5250

Directly using QUADRUPLE is OK for CASIA. But for other datasets (for example, subsets of ms_celeb_1m), it is hard to converge. When finetuning QUADRUPLE from the result of SINGLE, set lr<=0.01 and base = 10, lambda_min = 10. Try many times, it may converge.

zuoqing1988 avatar Mar 20 '18 07:03 zuoqing1988

thanks

zjz5250 avatar Mar 20 '18 08:03 zjz5250

@zuoqing1988 yes,when i set lr and base as you said,it worked,the loss began to converge obviously。 then,could you tell me the rule about how to change the lr and base?when the type changed。 thx

zjz5250 avatar Mar 20 '18 08:03 zjz5250

@zjz5250 Finetuning QUADRUPLE from SINGLE, lr = 0.01 base = 10, lambda_min=10, will be OK in very high probability. But if it diverges, try smaller lr. When QUADRUPLE converges with some lambda_min (10), You can also finetune with smaller lambda_min (5, 2 or 1). You can try argument "-snapshot xxx_iter_xxx.caffemodel"。

zuoqing1988 avatar Mar 20 '18 08:03 zuoqing1988

@zuoqing1988 thanks very much! I did as you said ,and it worked。 But when i tried to train a model with a big dataset about 5 million person,the loss could not converge again。 can you tell,how to set the lr 、 base ,and the lambda_min? appreciate!

zjz5250 avatar Mar 21 '18 01:03 zjz5250

@zjz5250 As far as I know, nobody in this forum successes to train QUADRUPLE on such a big dataset.

zuoqing1988 avatar Mar 21 '18 01:03 zuoqing1988

@zuoqing1988 so if i train SINGLE 、DOUBLE OR TRIPLE,it can converge? and how to set the lr 、 base 、 lambda_min

zjz5250 avatar Mar 21 '18 02:03 zjz5250

@zjz5250 I have trained on cleaned subsets of MS-Celeb-1M, around 80,000 people. SINGLE is easy to converge, but acc on LFW is less than 99%. TRIPLE and QUADRUPLE are hard to converge. #14 The largest dataset I have successed to train with QUADRUPLE includes around 30,000 person, 2 millon images.

zuoqing1988 avatar Mar 21 '18 02:03 zuoqing1988

@zuoqing1988 sorry,i made a mistake。 i mean that i want to train it with a dataset about 50 000 person,110 000 pics。 i set base_lr=0.001
momentum: 0.9 lr_policy: "multistep" stepvalue: 32000 stepvalue: 48000 stepvalue: 60000 gamma: 0.1 weight_decay:0.0005 and: base: 1000 gamma: 0.12 power: 1 lambda_min: 10 iteration: 0 but even SINGLE can not not converge. can you tell me your details about these parameters,when trained on cleaned subsets of MS-Celeb-1M

zjz5250 avatar Mar 21 '18 02:03 zjz5250

@zjz5250 I have select a subset of MS_Celeb_1M with 45,971 people, 3645724 images. It converges very fast for SINGLE (initial loss = 10.7, after 10,000 iters loss = 7.0, after 20,000 iters loss is less than 1.5). base_lr: 0.01 lr_policy: "multistep" gamma: 0.1

stepvalue: 160000 stepvalue: 240000 max_iter: 50000

display: 100 momentum: 0.9 weight_decay: 0.0005

base: 1000 gamma: 0.08 power: 1 lambda_min: 5 iteration: 0

zuoqing1988 avatar Mar 21 '18 05:03 zuoqing1988

@zuoqing1988 hi I am training QUADRUPLE with 1w person dataset. I have tried smaller lr many times, now the acc is about 0.4,but the loss is hard to converge。 my lr is very small now, only 1e-08. my parameters are as follows: base: 10 gamma: 0.08 power: 1 lambda_min: 10 iteration: 0 I have tried lambda_min:5,but once i changed lambda_min from 10 to 5,the acc become to 0 immediately。 what about your acc and loss at last when you trained with CASIA.

zjz5250 avatar Mar 22 '18 06:03 zjz5250

@zjz5250 If lamda_min = 5, the loss is less than 1.0 after convergency, and acc > 97% for SINGLE, acc > 99% for QUADRUPLE. lr = 1e-08 is too small. The smallest value for I have used is 1e-05. Maybe you should give more images for each person. In my experiments, each person has at least 50 images.

zuoqing1988 avatar Mar 22 '18 07:03 zuoqing1988

@zuoqing1988 can lamda_min be a large num?for example : base:50 lamda_min:50

we tried to train like this,found the loss can converge to 0.85 for double,with MS_Celeb_1M dataset

zjz5250 avatar Apr 10 '18 05:04 zjz5250

@zuoqing1988 I tried your proposal of base lambda and lr, it does not work. SO UPSET!! In the layer "MarginInnerProduct", except the parameters like "base, gamma, lambda", there is a special parameter "iteration", and the default value is 0. Do you know what dose this parameter mean, and dose it affects the result of the finetune?

MengWangTHU avatar Apr 10 '18 06:04 MengWangTHU

@MengWangTHU lambda = max(lambda_min,base*(1+gamma*iteration)^(-power))

@zjz5250 larger lambda_min lead to lower accuary, but it is easier to converge.

zuoqing1988 avatar Apr 10 '18 07:04 zuoqing1988

@zuoqing1988

How do you mean "large" for datasets?

"large" for large number of identities, or for large number of images?

twmht avatar Jul 03 '18 08:07 twmht