sphereface how to change m

at the beginning，i use m as “type：SINGLE”，and the loss converge，acc can is about 0.98. but when i change the m as “type：QUADRUPLE”，the loss become larger and larger，and acc come down also。 so how to change the type correctly？ hope the help

Mar 20 '18 06:03 zjz5250

Directly using QUADRUPLE is OK for CASIA. But for other datasets (for example, subsets of ms_celeb_1m), it is hard to converge. When finetuning QUADRUPLE from the result of SINGLE, set lr<=0.01 and base = 10, lambda_min = 10. Try many times, it may converge.

Mar 20 '18 07:03 zuoqing1988

thanks

Mar 20 '18 08:03 zjz5250

@zuoqing1988 yes，when i set lr and base as you said，it worked，the loss began to converge obviously。 then，could you tell me the rule about how to change the lr and base？when the type changed。 thx

Mar 20 '18 08:03 zjz5250

@zjz5250 Finetuning QUADRUPLE from SINGLE, lr = 0.01 base = 10, lambda_min=10, will be OK in very high probability. But if it diverges, try smaller lr. When QUADRUPLE converges with some lambda_min (10), You can also finetune with smaller lambda_min (5, 2 or 1). You can try argument "-snapshot xxx_iter_xxx.caffemodel"。

Mar 20 '18 08:03 zuoqing1988

@zuoqing1988 thanks very much！ I did as you said ，and it worked。 But when i tried to train a model with a big dataset about 5 million person，the loss could not converge again。 can you tell，how to set the lr 、 base ，and the lambda_min？ appreciate！

Mar 21 '18 01:03 zjz5250

@zjz5250 As far as I know, nobody in this forum successes to train QUADRUPLE on such a big dataset.

Mar 21 '18 01:03 zuoqing1988

@zuoqing1988 so if i train SINGLE 、DOUBLE OR TRIPLE，it can converge？ and how to set the lr 、 base 、 lambda_min

Mar 21 '18 02:03 zjz5250

@zjz5250 I have trained on cleaned subsets of MS-Celeb-1M, around 80,000 people. SINGLE is easy to converge, but acc on LFW is less than 99%. TRIPLE and QUADRUPLE are hard to converge. #14 The largest dataset I have successed to train with QUADRUPLE includes around 30,000 person, 2 millon images.

Mar 21 '18 02:03 zuoqing1988

@zuoqing1988 sorry，i made a mistake。 i mean that i want to train it with a dataset about 50 000 person，110 000 pics。 i set base_lr=0.001
momentum: 0.9 lr_policy: "multistep" stepvalue: 32000 stepvalue: 48000 stepvalue: 60000 gamma: 0.1 weight_decay:0.0005 and: base: 1000 gamma: 0.12 power: 1 lambda_min: 10 iteration: 0 but even SINGLE can not not converge. can you tell me your details about these parameters，when trained on cleaned subsets of MS-Celeb-1M

Mar 21 '18 02:03 zjz5250

@zjz5250 I have select a subset of MS_Celeb_1M with 45,971 people, 3645724 images. It converges very fast for SINGLE (initial loss = 10.7, after 10,000 iters loss = 7.0, after 20,000 iters loss is less than 1.5). base_lr: 0.01 lr_policy: "multistep" gamma: 0.1

stepvalue: 160000 stepvalue: 240000 max_iter: 50000

display: 100 momentum: 0.9 weight_decay: 0.0005

base: 1000 gamma: 0.08 power: 1 lambda_min: 5 iteration: 0

Mar 21 '18 05:03 zuoqing1988

@zuoqing1988 hi I am training QUADRUPLE with 1w person dataset. I have tried smaller lr many times, now the acc is about 0.4，but the loss is hard to converge。 my lr is very small now, only 1e-08. my parameters are as follows： base: 10 gamma: 0.08 power: 1 lambda_min: 10 iteration: 0 I have tried lambda_min：5，but once i changed lambda_min from 10 to 5，the acc become to 0 immediately。 what about your acc and loss at last when you trained with CASIA.

Mar 22 '18 06:03 zjz5250

@zjz5250 If lamda_min = 5, the loss is less than 1.0 after convergency, and acc > 97% for SINGLE, acc > 99% for QUADRUPLE. lr = 1e-08 is too small. The smallest value for I have used is 1e-05. Maybe you should give more images for each person. In my experiments, each person has at least 50 images.

Mar 22 '18 07:03 zuoqing1988

@zuoqing1988 can lamda_min be a large num？for example ： base：50 lamda_min：50

we tried to train like this，found the loss can converge to 0.85 for double，with MS_Celeb_1M dataset

Apr 10 '18 05:04 zjz5250

@zuoqing1988 I tried your proposal of base lambda and lr， it does not work. SO UPSET!! In the layer "MarginInnerProduct", except the parameters like "base, gamma, lambda", there is a special parameter "iteration", and the default value is 0. Do you know what dose this parameter mean, and dose it affects the result of the finetune?

Apr 10 '18 06:04 MengWangTHU

@MengWangTHU lambda = max(lambda_min,base*(1+gamma*iteration)^(-power))

@zjz5250 larger lambda_min lead to lower accuary, but it is easier to converge.

Apr 10 '18 07:04 zuoqing1988

@zuoqing1988

How do you mean "large" for datasets?

"large" for large number of identities, or for large number of images?

Jul 03 '18 08:07 twmht

sphereface sphereface copied to clipboard

how to change m

sphereface
sphereface copied to clipboard