Architecture discussions
Thanks for this repo! I managed to obtain ~40-45% tops, looks like you could achieve ~69%.
From the major architecture differences I noticed only RELU6. Did it boost accuracy, or it is just inherited from MobileNet?
Starting from lr 0.1, and decayed to its 0.5 every 20 epochs.
This would also point me to using open AI adamW. This is more or less a continuous version of your training regime. Would be interesting if you tried it. Also it converges quite quickly.
256 batchsize with 2 K80 GPU.
There is some evidence, that for such models batch-size of 1000-2000 is preferable =(
Hi snakers4, thx for your advice.
I have checked your repo before. In terms of why it is better: if I just use SGD, it already achieved 65% top1. The reason might be RELU. And during my training of Mnasnet, I found that the representation power is a little bit weak since the training loss is higher than the testing loss, so I change the dropout rate from default 0.5 to 0.0, which indeed boosted the performance to 68%.
I have also tried adam and rmsprop, but they just cannot converge in my case.
if I just use SGD, it already achieved 65% top1
You mean just using my model with SGD or your model?
so I change the dropout rate from default 0.5 to 0.0, which indeed boosted the performance to 68%
Interesting, afaik we did not use any dropout at all
I have also tried adam and rmsprop, but they just cannot converge in my case.
Interesting. Well, anyway, just give adamw and a larger batch a try =)
Also @Randl trained MobileNet2 with adam and SGD, adam converged 3x faster, but SGD converged only +1 pp better ...
All of this tells me that the newer networks are getting more and more fragile ...
My model + SGD
Agree with you, newer nets should be carefully tuned. Still don't know how the paper get 74%. Maybe large batchsize matters, but currently I may not have that big computation power to do it
We will see what @Randl will comment, he has more GPUs now afaik
@billhhh I use this code ,but the loss is not change,could you help to solve it~
I've managed to achieve 72+% top-1, however, I also managed to fuck up checkpointing, thus there is no checkpoint (yet).
@Randl Wow, that's pretty good result! Did you use 224 input? How about other settings? The same as mine or different?
nyway, just give adamw and a larger batch a try =)
have you solved the problem? i train the network but the loss did not drop.