ShuffleNet
ShuffleNet copied to clipboard
ImageNet result
I noticed your imagenet result is 62.2% (top1), can you share your training log for me or more detail training setting?
@gngdb do you have training logs to share?
@jaxony Thanks, Do you know how to replicate the paper result about 1x with group 3 in Pytroch.
Sorry, didn't keep logs the first time. I'm running it now trying to match the settings from the paper and I'll write the full training logs to a file. Have to make some changes to the imagenet training script so I'm thinking I should just open another pull request with that included?
Also, the pretrained model in the repo at the moment is groups=3
and multiplier 1x
as far as I know; those were the default settings.
@gngdb thanks for your kind reply, I try to implement the groups=3 and multiplier 1x, I got only 59.8% validation accuracy after 60 epochs
@gngdb Hi , I forgot to remind you that the the original paper have updated at 7 Dec 2017, https://arxiv.org/abs/1707.01083 , you can find better accuracy in new version, maybe because of larger learning rate and larger batch size and more training time, Good luck!
Hi, all I've trained imagenet for 3 times using this implement, following is my experiment settings and final result:
1).
init learning rate: 0.1
batch size: 256
learning rate divided by 10 for every 30 epochs
epochs: 100
groups:3
transforms:
transforms.Compose([ transforms.RandomResizedCrop(224), transforms.ColorJitter(brightness=0, contrast=0.5, saturation=0.5, hue=0), transforms.RandomHorizontalFlip(), transforms.ToTensor(), normalize, ])
Top1: 65.2%, Top5: 86.0%
2).
init learning rate: 0.2
batch size: 384
learning rate divided by 10 for every 30 epochs
epochs: 100
groups:3
transforms:
transforms.Compose([ transforms.RandomResizedCrop(224), transforms.RandomHorizontalFlip(), transforms.ToTensor(), normalize, ])
Top1: 64.8%
3).
init learning rate: 0.1
batch size: 256
learning rate divided by 10 for every 30 epochs
epochs: 100
groups: 8
transforms:
transforms.Compose([ transforms.RandomResizedCrop(224), transforms.ColorJitter(brightness=0.2, contrast=0.4, saturation=0.4, hue=0), transforms.RandomHorizontalFlip(), transforms.ToTensor(), normalize, ])
Top1: 64.8%, Top5:85.7%
I have only 2 gtx 1080, due to the limitation of gpu memory, I can only use batch size 256(groups=8)/384(groups=3), not the same as the paper. Notice that the group8 arch's training speed is much slower than group3 with more memory consumption.
@windid Hi, Thanks, do you use @jaxony 's code directly? what about weight decay?
@Zhouaojun Yes, I use this implement directly, weight decay is set to 4e-5 as mentioned in paper.
@windid ok, I will try your first implementation, with init learning rate: 0.5, batch size: 1024, 4 GPU.
Can you fit this implementation with batch size 1024 on 4 GPUs with groups=8? I've tried, and it's too big. I was going to run it with a batch size of 512, but there wasn't enough time available on the shared server.
Yes, My single GPU with 24G memory.
I've now run groups=8
following as close as I could to the training settings described in the paper. Unfortunately, didn't match performance of the paper. Top 1 is 63.372%, Top 5 is 84.661%. Almost certainly because I was using a batch size of 512 instead of 1024. But, at least I've got training logs to share now. The script I ran is here: https://github.com/gngdb/ShuffleNet/blob/master/imagenet/train.py
And the command I ran the script with, along with the full log of training, is here: https://raw.githubusercontent.com/gngdb/ShuffleNet/master/imagenet/training.log
The server was restarted at some point during training, so I had to restart from a checkpoint, which you can see in the logs.
@gngdb Did you use the linear learning rate decay as indicated in the paper? PyTorch does not have an implementation of that learning rate decay.
BTW, you also need to train for 256 epochs to match the 3e5 iterations in the paper.
@bowenc0221 for param_group in optimizer.param_groups: new_lr = args.lr - (float(minibatch_index)*args.lr)/3e5 param_group['lr'] = new_lr
Yes, it is a linear learning rate decay; the comment in this function is just wrong: https://github.com/gngdb/ShuffleNet/blob/master/imagenet/train.py#L297-L301
Hi! I am having trouble using imagenet weights from this repository. They don't work properly with default preprocessing from pytorch imagenet example, the validation accuracy is around 20%. By trial and error, I found out what it gets better if I reverse channel order and map pixel values into [-1,1] instead of normalization. Apparently, it is caused by the sequential-imagenet-loader which uses OpenCV instead of PIL and maps the pixel values like this: https://github.com/BayesWatch/sequential-imagenet-dataloader/blob/8b624b345858289a0829e00e4a863b6f1a093818/imagenet_seq/data.py#L197 However, even after I account for these problems I get only 58.8% instead of 62.2% as reported in the readme. Am I still missing something?
That's odd. Sorry, I don't have much time to look into it at the moment. What code are you using to test the accuracy? Could be something small, like the network being in train
rather than eval
mode?
@vadim-v-lebedev
- batchsize 1024.
- How many epoch did you train?if you use linear decay learning rate,at the last 3e5 iterations can get better result.
- 'use slightly less aggressive scale augmentation for data preprocessing' as paper said which is useful for small network,such as change default crop_area_fraction 0.08-1.0 to 0.5-1.0
@gngdb No, the mode is set correctly. I was using the preprocessing from pytorch imagenet example to test the accuracy https://github.com/pytorch/examples/blob/master/imagenet/main.py#L160
Sorry, don't know what could have gone wrong in that case. I'll get back to you if I find a moment to check.
Maybe the interpolation method is different or cropping behaves differently in some cases, the difference is not too big. Some clarification on the preprocessing method should be added to the readme though, right now it only mentions pytorch imagenet example.