Awesome_vision_transformer icon indicating copy to clipboard operation
Awesome_vision_transformer copied to clipboard

Implementation of vision transformer. ⭐⭐⭐

Awesome vision transformer

A curated list of vision transformer related resources, including survey, paper, source code, etc. Maintainer: Murufeng

We are looking for a maintainer! Let me know if interested.

Please feel free to pull requests or open an issue to add papers and source codes.

Table of Contents

Awesome Survey

优秀论文解读

Papers

ViT系列变种

  • 魔改算子
  • Local & Hierarchical & multi-scale
  • Transformer+卷积结合
  • Transformer模型压缩轻量化处理
  • DETR变种

MLP系列

Transformer+各类task迁移

  • 1.目标检测(Object-Detection)
  • 2.超分辨率(Super-Resolution)
  • 3.图像分割、语义分割(Segmentation)
  • 4.GAN/生成式/对抗式(GAN/Generative/Adversarial)
  • 5.track
  • 6.video
  • 7.多模态结合
  • 8.人体姿态估计
  • 9.神经网络架构搜索NAS
  • 10.人脸识别
  • 11.行人重识别
  • 12.密集人群检测
  • 13.医学图像处理
  • 14.图像风格迁移
  • 15.low level vision(去噪,去雨,复原,去模糊等等)

其它

模型代码复现

  • MLP
  • Attention
  • Transformer

Awesome Survey

论文解读

Paper(最新,最受关注的)

微软Transformer霸榜模型

  • 超越Swin,Transformer屠榜三大视觉任务!微软推出新作:Focal Self-Attention
  • **CSWin Transformer:具有十字形窗口的视觉Transformer主干(Swin Transformer 的进阶版) *
  • Dynamic Head: Unifying Object Detection Heads with Attentions
  • [Swin Transformer] Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

ViT系列变种

魔改算子

Local & Hierarchical & multi-scale

Transformer+卷积结合

Transformer模型压缩轻量化处理

DETR变种

Transformer+各类task迁移

Transformer+目标检测

2.超分辨率(Super-Resolution)

  • [TTSR] Learning Texture Transformer Network for Image Super-Resolution (CVPR)

3. 图像分割、语义分割(Segmentation)

4.GAN/生成式/对抗式(GAN/Generative/Adversarial)

  • [GANsformer] Generative Adversarial Transformers

  • [TransGAN]: Two Transformers Can Make One Strong GAN

  • [AOT-GAN] Aggregated Contextual Transformations for High-Resolution Image Inpainting

5.track

6.video

7.多模态结合

  • ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
    • 论文:https://arxiv.org/abs/2102.03334
    • 代码:https://github.com/dandelin/ViLT

8.人体姿态估计

9.神经网络架构搜索NAS

10.人脸识别

11.行人重识别

12.密集人群检测

  • [TransCrowd] TransCrowd: Weakly-Supervised Crowd Counting with Transformer

13.医学图像处理

14.图像风格迁移

  • StyTr2: Unbiased Image Style Transfer with Transformers

15.low level vision(去噪,去雨,复原,去模糊等等)

其它

模型代码复现

MLP

  • MLP-Mixer

  • ResMLP

  • RepMLP

  • gMLP

  • Spatial shift($S^{2}) MLP

  • ViP(Vision Permute-MLP)

Attention

  • 待更新

Transformer

  • 待更新