EfficientNet-PyTorch icon indicating copy to clipboard operation
EfficientNet-PyTorch copied to clipboard

Great difficulty reproducing training

Open DrJimFan opened this issue 6 years ago • 10 comments

Has anyone trained EfficientNet-B0 from scratch on ImageNet and successfully reproduced the results? I used the model in this repo and tried to follow the hyperparameters as closely as I can. I even implemented a modified RMSprop in pytorch that matches its Tensorflow counterpart (there's difference in treatment of epsilon). I used standard preprocessing. My setup is 8 GPUs, each GPU computes a batch of 32 images. The learning rate is properly scaled (0.016).

So far my best effort seems to be quite far below the reported numbers (> 3 points lower in top1 acc). The only difference I can think of is Exponential Model Averaging, which the official Tensorflow repo includes. But I highly doubt that EMA makes such a huge difference.

Are there anything else in the model itself that may change the training dynamics?

DrJimFan avatar Aug 20 '19 18:08 DrJimFan

Yes. In my own reproducing experiments the best top1 acc merely across 0.70.

I have try to copy every corner of network and experiment setup in this repo. No idea how to improve results.

allanwlz avatar Sep 07 '19 13:09 allanwlz

My reproduce: https://github.com/lukemelas/EfficientNet-PyTorch/issues/81

Hi, do you have solved the question?

zhjpqq avatar Sep 27 '19 23:09 zhjpqq

My reproduce: #81

Hi, do you have solved the question?

Not Yet.

Read the released code. The authur has many tricks used in training. I try to repeat most of them, and get acc 70.45%. Still far below 76%

allanwlz avatar Oct 11 '19 07:10 allanwlz

Hi can you share your training code of what you tried so far, that could really help

Esaada avatar Nov 05 '19 13:11 Esaada

how you use the multiprocess?

mathczh avatar Dec 27 '19 05:12 mathczh

Hi @LinxiFan

I'm also trying to reproduce the result : https://github.com/kakaobrain/fast-autoaugment/tree/dev/efficientnet

I got the same result on EfficientNet-b0 after I fixed many-many things including RMSProps and EMA.

(indeed, EMA caused great impact on training...)

But I got slightly poor results on b1-b4. Still trying to bridge the gap.

ildoonet avatar Jan 23 '20 05:01 ildoonet

Hi @LinxiFan

I'm also trying to reproduce the result : https://github.com/kakaobrain/fast-autoaugment/tree/dev/efficientnet

I got the same result on EfficientNet-b0 after I fixed many-many things including RMSProps and EMA.

(indeed, EMA caused great impact on training...)

But I got slightly poor results on b1-b4. Still trying to bridge the gap.

Hi @ildoonet

  1. Does EMA mean exponential moving average LR schedule? Do you mean EMA is much better than cosine lr schedule? My experiment on EfficientNet-B0 improves from 75% to 76.9%, after using your RMSProps instead of pytorch SGD. Cosine annealing LR schedule was used.

  2. Why did you initialize the mean square of gradient as 1 not 0?

Thank you

tzm1003306213 avatar Jul 06 '20 03:07 tzm1003306213

@tzm1003306213 could you tell what exactly configuration did you use, that you received 76.9? I set all parameters as defined in paper and got only 73.57 without EMA, 75.37 with EMA.

misadows avatar Jul 10 '20 11:07 misadows

@tzm1003306213 could you tell what exactly configuration did you use, that you received 76.9? I set all parameters as defined in paper and got only 73.57 without EMA, 75.37 with EMA.

@misadows I use all hyper-parameters given in the paper, got the same result as you with EMA, 76.9 with cosine lr_schedule.

tzm1003306213 avatar Jul 11 '20 01:07 tzm1003306213

I use torchvision implement, just difficult to find a proper lr. So IMO that's not the problem of this repo but that efficientnet itself is badly overrated anyway. A model which relys on hard working of trainning hyps are a bad one. Besides don't trust too much on academic papers if you can't trust nothing of them. Most of the made-in-academic SOTA model are heavily relying the big datasets like imagenet. Once the datasets become regularly smaller, those models will find their home in the dustbin.

sipie800 avatar May 03 '23 01:05 sipie800