ResidualAttentionNetwork-pytorch icon indicating copy to clipboard operation
ResidualAttentionNetwork-pytorch copied to clipboard

It seems that the this code reproduced results can not achieve the results in the original paper ?

Open YihangLou opened this issue 7 years ago • 106 comments

YihangLou avatar Mar 20 '18 03:03 YihangLou

ok, maybe i will try to do some image pre-processing and tune the super parameters to achieve that. but this code performance well in my own implement about medical images recognition.

tengshaofeng avatar Mar 20 '18 07:03 tengshaofeng

Thanks for your sharing code. Maybe there are many tricks in the original implementation. But the performance margin with the paper reported results are too large. Hope you can perfectly reproduce the results in the future!

YihangLou avatar Mar 20 '18 08:03 YihangLou

ok,i try to pre-process the image and keep the training process same as the paper. and the current code does not do the padding ,crop, flip and so on, and i use the adam,(the paper is sgd), and i only trained 100 epochs(about 204 epochs in the paper).

tengshaofeng avatar Mar 20 '18 08:03 tengshaofeng

@YihangLou , Hi, today I modified something, and get a new result: accuracy for cifar10 test set : 92.66%

tengshaofeng avatar Mar 21 '18 13:03 tengshaofeng

@YihangLou , i modify the optimizer, so the newest result now is 0.9354

tengshaofeng avatar Apr 04 '18 01:04 tengshaofeng

Hi @tengshaofeng, This result you got (0.9354) was using only the ResidualAttentionModel_92_32input network in the train.py file? Or do you first pretrain the network using the train_pre.py file and then train using the train.py file?

josianerodrigues avatar May 16 '18 15:05 josianerodrigues

Can you provide a trained model

123moon avatar May 18 '18 05:05 123moon

@josianerodrigues , use only the train.py, train_pre.py is just my back up for code.

tengshaofeng avatar May 18 '18 06:05 tengshaofeng

@123moon , i have provide the model of the final epoch. its accuracy is 0.9332.

tengshaofeng avatar May 18 '18 07:05 tengshaofeng

you provide the MODEL is 92-32,Do you have a model for the dataset imagenet 224???我可能用英语说不清,打扰你了,你有关于图像维数是224*224的训练模型吗??你的代码对我很有帮助,可是关于这个数据集,我没办法下载下来,所以请求你的帮助

123moon avatar May 19 '18 15:05 123moon

@123moon , 有224*224的训练模型的呀, residual_attention_network.py文件中的ResidualAttentionModel_92类就是。 下载imagenet可以访问http://image-net.org/download,需要你自己注册一下

tengshaofeng avatar May 22 '18 02:05 tengshaofeng

嗯呢,我看到了,我想问的是,有木有训练好的模型吖,我这个要跑起来要好久呢,我电脑内存不足,哎

123moon avatar May 22 '18 02:05 123moon

没有这个耶,我电脑也没那么多存储放那么大的数据

tengshaofeng avatar May 22 '18 06:05 tengshaofeng

Hi @tengshaofeng Could you tell me what the effect of resetting the learning rate at a particular epoch?

# Decaying Learning Rate
if (epoch+1) / float(total_epoch) == 0.3 or (epoch+1) / float(total_epoch) == 0.6 or (epoch+1) / float(total_epoch) == 0.9:
        lr /= 10
        print('reset learning rate to:', lr)
        for param_group in optimizer.param_groups:
             param_group['lr'] = lr
             print(param_group['lr'])

josianerodrigues avatar May 22 '18 17:05 josianerodrigues

@josianerodrigues , it is a trick for learning. when i decrease the learning rate, the loss decrease quickly. It means that when i use lr=0.1 to train 90 epochs, i found loss tending to converge, then i decrease the lr=0.01, the loss decrease again.

tengshaofeng avatar May 23 '18 01:05 tengshaofeng

thanks for the explanation :)

josianerodrigues avatar May 23 '18 16:05 josianerodrigues

Hi @tengshaofeng I also work on medical images.You mentioned this code worked well in your own implement about medical images recognition.I am in trouble when classify a medical image dataset.Could you tell me more details about it in your convenience?Or could you add my qq if possible?My qq number is 1922525328.

Thanks.

zhangrong1722 avatar Jun 10 '18 02:06 zhangrong1722

@estelle1722 , i use the 448input, it can convenience well.

tengshaofeng avatar Jun 11 '18 06:06 tengshaofeng

Hi, I also could not reproduce the results of the paper (with my implementation in Tensorflow) on CIFAR-10 even after exchanging a few emails with the author.

ondrejbiza avatar Aug 06 '18 15:08 ondrejbiza

@ondrejba , what is your best result now?

tengshaofeng avatar Aug 07 '18 07:08 tengshaofeng

My best accuracy was 94.32%, which is close to 95.01% reported in the paper, but it does not beat ResNet-164 with less parameters.

ondrejbiza avatar Aug 07 '18 18:08 ondrejbiza

@ondrejba , ok,your result are really better. have you read the ResidualAttentionModel_92_32input architecture in my code? If there are some difference with yours? or if you can share the code with me?

tengshaofeng avatar Aug 08 '18 09:08 tengshaofeng

I'm sorry for the delay. I'll look at your code over the weekend.

ondrejbiza avatar Aug 14 '18 15:08 ondrejbiza

@ondrejba thanks

tengshaofeng avatar Aug 15 '18 02:08 tengshaofeng

I noticed many difference just from looking at residual_attention_network.py:

  • I use filter size 3 in the first convolution, you use 5 (probably not important)
  • I don't use max pooling (downsampling 32x32 images after the first convolution is not a good idea)
  • my filter counts for the three scales are [64, 128, 256] whereas you have [256, 512, 1024] filters

I bet there are more differences but I don't have time to go through the whole attention module. I hope this helps.

ondrejbiza avatar Aug 20 '18 22:08 ondrejbiza

I'm actually surprised that you achieved such a good CIFAR accuracy with max pooling at the start of the network.

ondrejbiza avatar Aug 20 '18 22:08 ondrejbiza

Hi @ondrejba, If possible could you make your code available? Did you get 94% accuracy on what dataset and with which network? ResNet-164? What do you use after the first convolution? Sorry for taking your time.

josianerodrigues avatar Aug 21 '18 17:08 josianerodrigues

Hello, I got 94.32% accuracy with Attention92 on CIFAR-10. The 95.01% accuracy I mentioned is also for Attention92 evaluated on CIFAR-10; it was reported in the Residual Attention Networks paper but I didn't manage to replicate the results. I will look into open-sourcing my code.

After the first convolution ... there are all the other convolutions in the network followed by average pooling and a single fully-connected layer. This architecture is described in the Residual Attention Networks paper as well as the Identity Mappings paper that is a follow up to the Deep Residual Learning paper.

Cheers, Ondrej

ondrejbiza avatar Aug 21 '18 18:08 ondrejbiza

Thank you :)

josianerodrigues avatar Aug 24 '18 13:08 josianerodrigues

You're welcome! Let me know if you manage to reproduce Fei Wang's results.

ondrejbiza avatar Aug 24 '18 15:08 ondrejbiza