ShuffleNet icon indicating copy to clipboard operation
ShuffleNet copied to clipboard

ImageNet result

Open AojunZhou opened this issue 7 years ago • 21 comments

I noticed your imagenet result is 62.2% (top1), can you share your training log for me or more detail training setting?

AojunZhou avatar Dec 22 '17 09:12 AojunZhou

@gngdb do you have training logs to share?

jaxony avatar Dec 23 '17 04:12 jaxony

@jaxony Thanks, Do you know how to replicate the paper result about 1x with group 3 in Pytroch.

AojunZhou avatar Dec 23 '17 06:12 AojunZhou

Sorry, didn't keep logs the first time. I'm running it now trying to match the settings from the paper and I'll write the full training logs to a file. Have to make some changes to the imagenet training script so I'm thinking I should just open another pull request with that included?

Also, the pretrained model in the repo at the moment is groups=3 and multiplier 1x as far as I know; those were the default settings.

gngdb avatar Jan 09 '18 14:01 gngdb

@gngdb thanks for your kind reply, I try to implement the groups=3 and multiplier 1x, I got only 59.8% validation accuracy after 60 epochs

AojunZhou avatar Jan 10 '18 08:01 AojunZhou

@gngdb Hi , I forgot to remind you that the the original paper have updated at 7 Dec 2017, https://arxiv.org/abs/1707.01083 , you can find better accuracy in new version, maybe because of larger learning rate and larger batch size and more training time, Good luck!

AojunZhou avatar Jan 11 '18 02:01 AojunZhou

Hi, all I've trained imagenet for 3 times using this implement, following is my experiment settings and final result:

1). init learning rate: 0.1 batch size: 256 learning rate divided by 10 for every 30 epochs epochs: 100 groups:3 transforms: transforms.Compose([ transforms.RandomResizedCrop(224), transforms.ColorJitter(brightness=0, contrast=0.5, saturation=0.5, hue=0), transforms.RandomHorizontalFlip(), transforms.ToTensor(), normalize, ]) Top1: 65.2%, Top5: 86.0%

2). init learning rate: 0.2 batch size: 384 learning rate divided by 10 for every 30 epochs epochs: 100 groups:3 transforms: transforms.Compose([ transforms.RandomResizedCrop(224), transforms.RandomHorizontalFlip(), transforms.ToTensor(), normalize, ]) Top1: 64.8%

3). init learning rate: 0.1 batch size: 256 learning rate divided by 10 for every 30 epochs epochs: 100 groups: 8 transforms: transforms.Compose([ transforms.RandomResizedCrop(224), transforms.ColorJitter(brightness=0.2, contrast=0.4, saturation=0.4, hue=0), transforms.RandomHorizontalFlip(), transforms.ToTensor(), normalize, ])

Top1: 64.8%, Top5:85.7%

I have only 2 gtx 1080, due to the limitation of gpu memory, I can only use batch size 256(groups=8)/384(groups=3), not the same as the paper. Notice that the group8 arch's training speed is much slower than group3 with more memory consumption.

windid avatar Jan 14 '18 03:01 windid

@windid Hi, Thanks, do you use @jaxony 's code directly? what about weight decay?

AojunZhou avatar Jan 14 '18 08:01 AojunZhou

@Zhouaojun Yes, I use this implement directly, weight decay is set to 4e-5 as mentioned in paper.

windid avatar Jan 14 '18 08:01 windid

@windid ok, I will try your first implementation, with init learning rate: 0.5, batch size: 1024, 4 GPU.

AojunZhou avatar Jan 14 '18 09:01 AojunZhou

Can you fit this implementation with batch size 1024 on 4 GPUs with groups=8? I've tried, and it's too big. I was going to run it with a batch size of 512, but there wasn't enough time available on the shared server.

gngdb avatar Jan 15 '18 10:01 gngdb

Yes, My single GPU with 24G memory.

AojunZhou avatar Jan 16 '18 10:01 AojunZhou

I've now run groups=8 following as close as I could to the training settings described in the paper. Unfortunately, didn't match performance of the paper. Top 1 is 63.372%, Top 5 is 84.661%. Almost certainly because I was using a batch size of 512 instead of 1024. But, at least I've got training logs to share now. The script I ran is here: https://github.com/gngdb/ShuffleNet/blob/master/imagenet/train.py

And the command I ran the script with, along with the full log of training, is here: https://raw.githubusercontent.com/gngdb/ShuffleNet/master/imagenet/training.log

The server was restarted at some point during training, so I had to restart from a checkpoint, which you can see in the logs.

gngdb avatar Jan 21 '18 15:01 gngdb

@gngdb Did you use the linear learning rate decay as indicated in the paper? PyTorch does not have an implementation of that learning rate decay.

BTW, you also need to train for 256 epochs to match the 3e5 iterations in the paper.

bowenc0221 avatar Jul 22 '18 06:07 bowenc0221

@bowenc0221 for param_group in optimizer.param_groups: new_lr = args.lr - (float(minibatch_index)*args.lr)/3e5 param_group['lr'] = new_lr

bailvwangzi avatar Jul 26 '18 11:07 bailvwangzi

Yes, it is a linear learning rate decay; the comment in this function is just wrong: https://github.com/gngdb/ShuffleNet/blob/master/imagenet/train.py#L297-L301

gngdb avatar Jul 26 '18 11:07 gngdb

Hi! I am having trouble using imagenet weights from this repository. They don't work properly with default preprocessing from pytorch imagenet example, the validation accuracy is around 20%. By trial and error, I found out what it gets better if I reverse channel order and map pixel values into [-1,1] instead of normalization. Apparently, it is caused by the sequential-imagenet-loader which uses OpenCV instead of PIL and maps the pixel values like this: https://github.com/BayesWatch/sequential-imagenet-dataloader/blob/8b624b345858289a0829e00e4a863b6f1a093818/imagenet_seq/data.py#L197 However, even after I account for these problems I get only 58.8% instead of 62.2% as reported in the readme. Am I still missing something?

vadim-v-lebedev avatar Jul 27 '18 11:07 vadim-v-lebedev

That's odd. Sorry, I don't have much time to look into it at the moment. What code are you using to test the accuracy? Could be something small, like the network being in train rather than eval mode?

gngdb avatar Jul 27 '18 16:07 gngdb

@vadim-v-lebedev

  1. batchsize 1024.
  2. How many epoch did you train?if you use linear decay learning rate,at the last 3e5 iterations can get better result.
  3. 'use slightly less aggressive scale augmentation for data preprocessing' as paper said which is useful for small network,such as change default crop_area_fraction 0.08-1.0 to 0.5-1.0

bailvwangzi avatar Jul 27 '18 16:07 bailvwangzi

@gngdb No, the mode is set correctly. I was using the preprocessing from pytorch imagenet example to test the accuracy https://github.com/pytorch/examples/blob/master/imagenet/main.py#L160

vadim-v-lebedev avatar Aug 01 '18 12:08 vadim-v-lebedev

Sorry, don't know what could have gone wrong in that case. I'll get back to you if I find a moment to check.

gngdb avatar Aug 01 '18 14:08 gngdb

Maybe the interpolation method is different or cropping behaves differently in some cases, the difference is not too big. Some clarification on the preprocessing method should be added to the readme though, right now it only mentions pytorch imagenet example.

vadim-v-lebedev avatar Aug 01 '18 14:08 vadim-v-lebedev