Vision-Transformer-CIFAR10 icon indicating copy to clipboard operation
Vision-Transformer-CIFAR10 copied to clipboard

Pytorch implementation of some vision transformers, trained on CIFAR-10.

Vision-Transformer-CIFAR10

在此项目中,我们集合了当前一些vision transformer的Pytorch实现,并尝试在CIFAR-10数据集训练。

使用

1. 训练

python train.py-net vit -gpu

2. 验证

python test.py -net vit -weights path_to_the_weight

结果

Transformer缺少CNN的归纳偏置,通常需要大量训练数据和数据增强才能达到良好的效果。当前实现还未对数据增强以及学习率等超参进一步微调。

笔记

下面是参照别人的实现中的一些记录。笔记

已完成

  • [x] [ViT]: https://arxiv.org/abs/2010.11929

  • [x] [DeiT]: https://arxiv.org/abs/2012.12877

  • [x] [DeepViT]: https://arxiv.org/abs/2103.11886

  • [x] [CaiT]: https://arxiv.org/abs/2103.17239

  • [x] [CeiT]: https://arxiv.org/abs/2103.11816

  • [x] [CPVT]:https://arxiv.org/abs/2102.10882

  • [x] [CvT]: https://arxiv.org/abs/2103.15808

  • [x] [LeViT]: https://arxiv.org/abs/2104.01136

  • [x] [PVT]:https://arxiv.org/abs/2102.12122

  • [x] [PVTv2]:https://arxiv.org/abs/2106.13797

  • [x] [Swin]: https://arxiv.org/abs/2103.14030

  • [x] [Shuffle]:https://arxiv.org/abs/2106.03650

TODO

  • [ ] 数据增强(cutmix,mixup等)

  • [ ] 超参调整

  • [ ] [PiT]: https://arxiv.org/abs/2103.16302

  • [ ] [Tokens-to-Token]: https://arxiv.org/abs/2101.11986

  • [ ] [CrossViT]: https://arxiv.org/abs/2103.14899

  • [ ] [LocalViT]: https://arxiv.org/abs/2104.05707

  • [ ] [Twins]: https://arxiv.org/abs/2104.13840

参考

https://github.com/lucidrains/vit-pytorch

https://github.com/weiaicunzai/pytorch-cifar100

https://github.com/berniwal/swin-transformer-pytorch

https://github.com/whai362/PVT