Not getting a good accuracy
Hi @farmingyard ,
I ran your deploy prototxt on imagenet this weekend yet still got a bad accuracy output. (exactly the same prototxt
I'd be appreciated if you could share your solver file with me to check.
much thanks!
@leolee96
here is an example, batch size is 64, you can try it!
net: "train_val.prototxt" #test_initialization: false #test_iter: 100 #test_interval: 5000 display: 40 average_loss: 40 base_lr: 0.01 lr_policy: "poly" power: 1.0 max_iter: 1000000 momentum: 0.9 weight_decay: 0.0001 snapshot: 5000 snapshot_prefix: "shufflenet"
@farmingyard thanks man! Btw what's your acc for this? I only got 54% as top1_acc and 79% as top5_acc. According to the paper it's only around 34.1% error rate.
I tested on two GPUs, this might cause some problem if the ShuffleChannel layer doesn't support multiple-GPU. I'm not sure tho. I'll try your solver to see.
thanks a lot!
@leolee96
I got 62.8% top1 acc and 84.7% top 5 acc, the result is not good enough with paper's, it still needs tuning...
mark
hi @farmingyard i just wonder that how do you write the prototxt? do you code to write? if, can you share it? thanks.
@zimenglan-sysu-512 You can find this: https://github.com/farmingyard/Caffe-Net-Generator
Hi @farmingyard ,
Do you finally reach the 65.9% top 1 acc in the paper?
I trained with: batchsize 256, totally 100 epochs, base lr: 0.1 decay the learning rate by 0.1 every 30 epochs.
Yet I only got around 64% acc at the end.
I'd be appreciated if you could share with me some tricks in your training process.
Thx a lot!
@leolee96 Your model is better than mine,i didn't keep on training anymore,so my result is still same to the above.
Hi @farmingyard ,@leolee96 I trained shufflenet on our data , but got a worse output than alexnet. I'd be appreciated if you could share your curve of train loss. Thanks !
hi, @leolee96 can you share your pre-trained model. Thanks.
hi, @leolee96 , when you train shuffle net on two GPUs,you said this might cause some problem beacause the ShuffleChannel layer doesn't support multiple-GPU. how do you solve ? I got "Multi-GPU execution not available - rebuild with USE_NCCL" error, could you give me some advice
@xiaomr Hi, I'm not sure tho. Since the depthwise conv layer are not designed for all parallel-GPU systems, if you have your own parallel GPU system, you may need to modify this layers to fit your system. I didn't get this USE_NCLL error even before the modification. Anyway, try to run shuffle net on a single GPU first.
@Thank you for your advice! I have fixed the problem, it seems that depose layer can support multi gnu, the problem is because I chose the wrong branch of caffe~
Hi, @leolee96 , do you finally reach the 65.9% val acc ? I trained 90 epochs, with a batch_size of 256 on 4 GPUs, base_lr=0.1 and divide it by 10 every 30 epochs, wd=4e-5. But I only get 63.3 val acc. Can you give me some advice ?
@leolee96 Hi, I am a new guy to learn deep learning ,now, I want to use Caffe to train ShuffleNet on my own data ,but just with one .prototxt file I have no idea ,could you give me some direction or advises?
I can reproduce the paper's accuracy of a 40Mflops shufflenet with tensorflow (https://github.com/tensorpack/tensorpack/tree/master/examples/ImageNetModels#shufflenet). You can use the configuration there as a reference.
I only get 43% val acc when the epoch is 400000, I use your solver.prototxt and change the deploy.prototxt into train_val.prototxt. Is it not sufficient to train? or the preprocess of data is not true? Mine is: transform_param { mirror: true crop_size: 224 scale: 0.017 mean_value: [103.94,116.78,123.68] } should I change the preprocess into : transform_param { mirror: false crop_size: 224 mean_file: "data/ilsvrc12/imagenet_mean.binaryproto" } or anything else?
@VectorYYYY You mean batchsize 256 for every GPU or total batchsize 256 for 4 GPUs?
According to the paper the batch size is 256 on each GPU making a total batch size of 1024. Other settings such as learning rate schedule are also clear so I don't know why would people invent their own settings if the goal is to reproduce the result.
1080ti can only set batchsize to 64 and I set 4 gpus for training. But I found loss around 2.1 cannot decrease and the model top1 accuracy is around 53%
According to https://arxiv.org/abs/1706.02677 you can use 1/4 learning rate together with 1/4 batch size and train 4x more steps to get roughly the same results.
Besides that, my implementation can actually train a shufflenet 1x with batchsize 128 on a 1080ti, and shufflenet 0.5x with batchsize 256.