EfficientNet-PyTorch Great difficulty reproducing training

Has anyone trained EfficientNet-B0 from scratch on ImageNet and successfully reproduced the results? I used the model in this repo and tried to follow the hyperparameters as closely as I can. I even implemented a modified RMSprop in pytorch that matches its Tensorflow counterpart (there's difference in treatment of epsilon). I used standard preprocessing. My setup is 8 GPUs, each GPU computes a batch of 32 images. The learning rate is properly scaled (0.016).

So far my best effort seems to be quite far below the reported numbers (> 3 points lower in top1 acc). The only difference I can think of is Exponential Model Averaging, which the official Tensorflow repo includes. But I highly doubt that EMA makes such a huge difference.

Are there anything else in the model itself that may change the training dynamics?

Aug 20 '19 18:08 DrJimFan

Yes. In my own reproducing experiments the best top1 acc merely across 0.70.

I have try to copy every corner of network and experiment setup in this repo. No idea how to improve results.

Sep 07 '19 13:09 allanwlz

My reproduce: https://github.com/lukemelas/EfficientNet-PyTorch/issues/81

Hi, do you have solved the question?

Sep 27 '19 23:09 zhjpqq

My reproduce: #81

Hi, do you have solved the question?

Not Yet.

Read the released code. The authur has many tricks used in training. I try to repeat most of them, and get acc 70.45%. Still far below 76%

Oct 11 '19 07:10 allanwlz

Hi can you share your training code of what you tried so far, that could really help

Nov 05 '19 13:11 Esaada

how you use the multiprocess?

Dec 27 '19 05:12 mathczh

Hi @LinxiFan

I'm also trying to reproduce the result : https://github.com/kakaobrain/fast-autoaugment/tree/dev/efficientnet

I got the same result on EfficientNet-b0 after I fixed many-many things including RMSProps and EMA.

(indeed, EMA caused great impact on training...)

But I got slightly poor results on b1-b4. Still trying to bridge the gap.

Jan 23 '20 05:01 ildoonet

Hi @LinxiFan

I'm also trying to reproduce the result : https://github.com/kakaobrain/fast-autoaugment/tree/dev/efficientnet

I got the same result on EfficientNet-b0 after I fixed many-many things including RMSProps and EMA.

(indeed, EMA caused great impact on training...)

But I got slightly poor results on b1-b4. Still trying to bridge the gap.

Hi @ildoonet

Does EMA mean exponential moving average LR schedule? Do you mean EMA is much better than cosine lr schedule? My experiment on EfficientNet-B0 improves from 75% to 76.9%, after using your RMSProps instead of pytorch SGD. Cosine annealing LR schedule was used.
Why did you initialize the mean square of gradient as 1 not 0?

Thank you

Jul 06 '20 03:07 tzm1003306213

@tzm1003306213 could you tell what exactly configuration did you use, that you received 76.9? I set all parameters as defined in paper and got only 73.57 without EMA, 75.37 with EMA.

Jul 10 '20 11:07 misadows

@tzm1003306213 could you tell what exactly configuration did you use, that you received 76.9? I set all parameters as defined in paper and got only 73.57 without EMA, 75.37 with EMA.

@misadows I use all hyper-parameters given in the paper, got the same result as you with EMA, 76.9 with cosine lr_schedule.

Jul 11 '20 01:07 tzm1003306213

I use torchvision implement, just difficult to find a proper lr. So IMO that's not the problem of this repo but that efficientnet itself is badly overrated anyway. A model which relys on hard working of trainning hyps are a bad one. Besides don't trust too much on academic papers if you can't trust nothing of them. Most of the made-in-academic SOTA model are heavily relying the big datasets like imagenet. Once the datasets become regularly smaller, those models will find their home in the dustbin.

May 03 '23 01:05 sipie800