ShuffleNet Not getting a good accuracy

Hi @farmingyard ,

I ran your deploy prototxt on imagenet this weekend yet still got a bad accuracy output. (exactly the same prototxt

I'd be appreciated if you could share your solver file with me to check.

much thanks!

Jul 24 '17 01:07 leochli

@leolee96

here is an example, batch size is 64, you can try it!

net: "train_val.prototxt" #test_initialization: false #test_iter: 100 #test_interval: 5000 display: 40 average_loss: 40 base_lr: 0.01 lr_policy: "poly" power: 1.0 max_iter: 1000000 momentum: 0.9 weight_decay: 0.0001 snapshot: 5000 snapshot_prefix: "shufflenet"

Jul 24 '17 02:07 farmingyard

@farmingyard thanks man! Btw what's your acc for this? I only got 54% as top1_acc and 79% as top5_acc. According to the paper it's only around 34.1% error rate.

I tested on two GPUs, this might cause some problem if the ShuffleChannel layer doesn't support multiple-GPU. I'm not sure tho. I'll try your solver to see.

thanks a lot!

Jul 24 '17 02:07 leochli

@leolee96

I got 62.8% top1 acc and 84.7% top 5 acc, the result is not good enough with paper's, it still needs tuning...

Jul 24 '17 03:07 farmingyard

mark

Jul 24 '17 08:07 KeyKy

hi @farmingyard i just wonder that how do you write the prototxt? do you code to write? if, can you share it? thanks.

Aug 01 '17 13:08 zimenglan-sysu-512

@zimenglan-sysu-512 You can find this: https://github.com/farmingyard/Caffe-Net-Generator

Aug 02 '17 04:08 farmingyard

Hi @farmingyard ,

Do you finally reach the 65.9% top 1 acc in the paper?

I trained with: batchsize 256, totally 100 epochs, base lr: 0.1 decay the learning rate by 0.1 every 30 epochs.

Yet I only got around 64% acc at the end.

I'd be appreciated if you could share with me some tricks in your training process.

Thx a lot!

Aug 11 '17 09:08 leochli

@leolee96 Your model is better than mine，i didn't keep on training anymore，so my result is still same to the above.

Aug 12 '17 02:08 farmingyard

Hi @farmingyard ,@leolee96 I trained shufflenet on our data , but got a worse output than alexnet. I'd be appreciated if you could share your curve of train loss. Thanks !

Aug 24 '17 07:08 7oud

hi, @leolee96 can you share your pre-trained model. Thanks.

Sep 05 '17 12:09 zhangleiedu

hi, @leolee96 , when you train shuffle net on two GPUs,you said this might cause some problem beacause the ShuffleChannel layer doesn't support multiple-GPU. how do you solve ? I got "Multi-GPU execution not available - rebuild with USE_NCCL" error, could you give me some advice

Sep 06 '17 02:09 xiaomr

@xiaomr Hi, I'm not sure tho. Since the depthwise conv layer are not designed for all parallel-GPU systems, if you have your own parallel GPU system, you may need to modify this layers to fit your system. I didn't get this USE_NCLL error even before the modification. Anyway, try to run shuffle net on a single GPU first.

Sep 06 '17 02:09 leochli

@Thank you for your advice! I have fixed the problem, it seems that depose layer can support multi gnu, the problem is because I chose the wrong branch of caffe~

Sep 07 '17 09:09 xiaomr

Hi, @leolee96 , do you finally reach the 65.9% val acc ? I trained 90 epochs, with a batch_size of 256 on 4 GPUs, base_lr=0.1 and divide it by 10 every 30 epochs, wd=4e-5. But I only get 63.3 val acc. Can you give me some advice ?

Sep 09 '17 03:09 adapt-image-models

@leolee96 Hi, I am a new guy to learn deep learning ,now, I want to use Caffe to train ShuffleNet on my own data ,but just with one .prototxt file I have no idea ,could you give me some direction or advises?

Sep 25 '17 10:09 anlongstory

I can reproduce the paper's accuracy of a 40Mflops shufflenet with tensorflow (https://github.com/tensorpack/tensorpack/tree/master/examples/ImageNetModels#shufflenet). You can use the configuration there as a reference.

Oct 10 '17 01:10 ppwwyyxx

I only get 43% val acc when the epoch is 400000, I use your solver.prototxt and change the deploy.prototxt into train_val.prototxt. Is it not sufficient to train？ or the preprocess of data is not true? Mine is: transform_param { mirror: true crop_size: 224 scale: 0.017 mean_value: [103.94,116.78,123.68] } should I change the preprocess into ： transform_param { mirror: false crop_size: 224 mean_file: "data/ilsvrc12/imagenet_mean.binaryproto" } or anything else?

Oct 16 '17 02:10 andeyeluguo

@VectorYYYY You mean batchsize 256 for every GPU or total batchsize 256 for 4 GPUs?

Sep 17 '18 06:09 wang5566

According to the paper the batch size is 256 on each GPU making a total batch size of 1024. Other settings such as learning rate schedule are also clear so I don't know why would people invent their own settings if the goal is to reproduce the result.

Sep 17 '18 06:09 ppwwyyxx

1080ti can only set batchsize to 64 and I set 4 gpus for training. But I found loss around 2.1 cannot decrease and the model top1 accuracy is around 53%

Sep 17 '18 06:09 wang5566

According to https://arxiv.org/abs/1706.02677 you can use 1/4 learning rate together with 1/4 batch size and train 4x more steps to get roughly the same results.

Besides that, my implementation can actually train a shufflenet 1x with batchsize 128 on a 1080ti, and shufflenet 0.5x with batchsize 256.

Sep 17 '18 07:09 ppwwyyxx