ICCV2023-Papers-with-Code icon indicating copy to clipboard operation
ICCV2023-Papers-with-Code copied to clipboard

ICCV 2023 论文和开源项目合集

ICCV2021-Papers-with-Code

ICCV 2021 论文和开源项目合集(papers with code)!

1617 papers accepted - 25.9% acceptance rate

ICCV 2021 收录论文IDs:https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vRfaTmsNweuaA0Gjyu58H_Cx56pGwFhcTYII0u1pg0U7MbhlgY0R6Y-BbK3xFhAiwGZ26u3TAtN5MnS/pubhtml

注1:欢迎各位大佬提交issue,分享ICCV 2021论文和开源项目!

注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision

【ICCV 2021 论文和开源目录】

  • Backbone
  • Transformer
  • 涨点神器
  • GAN
  • NAS
  • NeRF
  • Loss
  • Zero-Shot Learning
  • Few-Shot Learning
  • 长尾(Long-tailed)
  • Vision and Language
  • 无监督/自监督(Self-Supervised)
  • Multi-Label Image Recognition(多标签图像识别)
  • 2D目标检测(Object Detection)
  • 语义分割(Semantic Segmentation)
  • 实例分割(Instance Segmentation)
  • 医学图像分割(Medical Image Segmentation)
  • 视频目标分割(Video Object Segmentation)
  • Few-shot Segmentation
  • 人体运动分割(Human Motion Segmentation)
  • 目标跟踪(Object Tracking)
  • 3D Point Cloud
  • 3D Object Detection(3D目标检测)
  • 3D Semantic Segmenation(3D语义分割)
  • 3D Instance Segmentation(3D实例分割)
  • 3D Multi-Object Tracking(3D多目标跟踪)
  • Point Cloud Denoising(点云去噪)
  • Point Cloud Registration(点云配准)
  • Point Cloud Completion(点云补全)
  • 雷达语义分割(Radar Semantic Segmentation)
  • 图像恢复(Image Restoration)
  • 超分辨率(Super-Resolution)
  • 去噪(Denoising)
  • 医学图像去噪(Medical Image Denoising)
  • 去模糊(Deblurring)
  • 阴影去除(Shadow Removal)
  • 视频插帧(Video Frame Interpolation)
  • 视频修复/补全(Video Inpainting)
  • 行人重识别(Person Re-identification)
  • 行人搜索(Person Search)
  • 2D/3D人体姿态估计(2D/3D Human Pose Estimation)
  • 6D位姿估计(6D Object Pose Estimation)
  • 3D人头重建(3D Head Reconstruction)
  • 人脸识别(Face Recognition)
  • 人脸表情识别(Facial Expression Recognition)
  • 行为识别(Action Recognition)
  • 时序动作定位(Temporal Action Localization)
  • 动作检测(Action Detection)
  • 群体活动识别(Group Activity Recognition)
  • 手语识别(Sign Language Recognition)
  • 文本检测(Text Detection)
  • 文本识别(Text Recognition)
  • 文本替换(Text Repalcement)
  • 视觉问答(Visual Question Answering, VQA)
  • 对抗攻击(Adversarial Attack)
  • 深度估计(Depth Estimation)
  • 视线估计(Gaze Estimation)
  • 人群计数(Crowd Counting)
  • 车道线检测(Lane Detection)
  • 轨迹预测(Trajectory Prediction)
  • 异常检测(Anomaly Detection)
  • 场景图生成(Scene Graph Generation)
  • 图像编辑(Image Editing)
  • 图像合成(Image Synthesis)
  • 图像检索(Image Retrieval)
  • 三维重建(3D Reconstruction)
  • 视频稳像(Video Stabilization)
  • 细粒度识别(Fine-Grained Recognition)
  • 风格迁移(Style Transfer)
  • 神经绘画(Neural Painting)
  • 特征匹配(Feature Matching)
  • 语义对应(Semantic Correspondence)
  • 边缘检测(Edge Detection)
  • 相机标定(Camera Calibration)
  • 图像质量评估(Image Quality Assessment)
  • 度量学习(Metric Learning)
  • Unsupervised Domain Adaptation
  • Video Rescaling
  • Hand-Object Interaction
  • Vision-and-Language Navigation
  • 数据集(Datasets)
  • 其他(Others)

Backbone

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

  • Paper(Oral): https://arxiv.org/abs/2102.12122
  • Code: https://github.com/whai362/PVT

AutoFormer: Searching Transformers for Visual Recognition

  • Paper: https://arxiv.org/abs/2107.00651
  • Code: https://github.com/microsoft/AutoML

Bias Loss for Mobile Neural Networks

  • Paper: https://arxiv.org/abs/2107.11170
  • Code: None

Vision Transformer with Progressive Sampling

  • Paper: https://arxiv.org/abs/2108.01684
  • Code: https://github.com/yuexy/PS-ViT

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

  • Paper: https://arxiv.org/abs/2101.11986
  • Code: https://github.com/yitu-opensource/T2T-ViT

Rethinking Spatial Dimensions of Vision Transformers

  • Paper: https://arxiv.org/abs/2103.16302

  • Code: https://github.com/naver-ai/pit

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

  • Paper: https://arxiv.org/abs/2103.14030
  • Code: https://github.com/microsoft/Swin-Transformer

Conformer: Local Features Coupling Global Representations for Visual Recognition

  • Paper: https://arxiv.org/abs/2105.03889

  • Code: https://github.com/pengzhiliang/Conformer

MicroNet: Improving Image Recognition with Extremely Low FLOPs

  • Paper: https://arxiv.org/abs/2108.05894
  • Code: https://github.com/liyunsheng13/micronet

Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition

  • Paper: https://arxiv.org/abs/2102.01063
  • Code: https://github.com/idstcv/ZenNAS

Visual Transformer

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

  • Paper: https://arxiv.org/abs/2103.14030
  • Code: https://github.com/microsoft/Swin-Transformer

An Empirical Study of Training Self-Supervised Vision Transformers

  • Paper(Oral): https://arxiv.org/abs/2104.02057
  • MoCo v3 Code: None

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

  • Paper(Oral): https://arxiv.org/abs/2102.12122
  • Code: https://github.com/whai362/PVT

Group-Free 3D Object Detection via Transformers

  • Paper: https://arxiv.org/abs/2104.00678
  • Code: None

Spatial-Temporal Transformer for Dynamic Scene Graph Generation

  • Paper: https://arxiv.org/abs/2107.12309
  • Code: None

Rethinking and Improving Relative Position Encoding for Vision Transformer

  • Paper: https://arxiv.org/abs/2107.14222
  • Code: https://github.com/microsoft/AutoML/tree/main/iRPE

Emerging Properties in Self-Supervised Vision Transformers

  • Paper: https://arxiv.org/abs/2104.14294
  • Code: https://github.com/facebookresearch/dino

Learning Spatio-Temporal Transformer for Visual Tracking

  • Paper: https://arxiv.org/abs/2103.17154
  • Code: https://github.com/researchmm/Stark

Fast Convergence of DETR with Spatially Modulated Co-Attention

  • Paper: https://arxiv.org/abs/2101.07448
  • Code: https://github.com/abc403/SMCA-replication

Vision Transformer with Progressive Sampling

  • Paper: https://arxiv.org/abs/2108.01684
  • Code: https://github.com/yuexy/PS-ViT

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

  • Paper: https://arxiv.org/abs/2101.11986
  • Code: https://github.com/yitu-opensource/T2T-ViT

Rethinking Spatial Dimensions of Vision Transformers

  • Paper: https://arxiv.org/abs/2103.16302
  • Code: https://github.com/naver-ai/pit

The Right to Talk: An Audio-Visual Transformer Approach

  • Paper: https://arxiv.org/abs/2108.03256
  • Code: None

Joint Inductive and Transductive Learning for Video Object Segmentation

  • Paper: https://arxiv.org/abs/2108.03679
  • Code: https://github.com/maoyunyao/JOINT

Conformer: Local Features Coupling Global Representations for Visual Recognition

  • Paper: https://arxiv.org/abs/2105.03889
  • Code: https://github.com/pengzhiliang/Conformer

Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer

  • Paper: https://arxiv.org/abs/2108.03032
  • Code: https://github.com/zhiheLu/CWT-for-FSS

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction

  • Paper: https://arxiv.org/abs/2108.03798
  • Code: https://github.com/wzmsltw/PaintTransformer

Conditional DETR for Fast Training Convergence

  • Paper: https://arxiv.org/abs/2108.06152
  • Code: https://github.com/Atten4Vis/ConditionalDETR

MUSIQ: Multi-scale Image Quality Transformer

  • Paper: https://arxiv.org/abs/2108.05997
  • Code: https://github.com/google-research/google-research/tree/master/musiq

SOTR: Segmenting Objects with Transformers

  • Paper: https://arxiv.org/abs/2108.06747
  • Code: https://github.com/easton-cau/SOTR

PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers

  • Paper(Oral): https://arxiv.org/abs/2108.08839
  • Code: https://github.com/yuxumin/PoinTr

SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer

  • Paper: https://arxiv.org/abs/2108.04444
  • Code: https://github.com/AllenXiangX/SnowflakeNet

Improving 3D Object Detection with Channel-wise Transformer

  • Paper: https://arxiv.org/abs/2108.10723
  • Code: https://github.com/hlsheng1/CT3D

TransFER: Learning Relation-aware Facial Expression Representations with Transformers

  • Paper: https://arxiv.org/abs/2108.11116
  • Code: None

GroupFormer: Group Activity Recognition with Clustered Spatial-Temporal Transformer

  • Paper: https://arxiv.org/abs/2108.12630
  • Code: https://github.com/xueyee/GroupFormer

Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction

  • Paper: https://arxiv.org/abs/2109.00512
  • Code: https://github.com/facebookresearch/co3d
  • Dataset: https://github.com/facebookresearch/co3d

Voxel Transformer for 3D Object Detection

  • Paper: https://arxiv.org/abs/2109.02497
  • Code: None

3D Human Texture Estimation from a Single Image with Transformers

  • Homepage: https://www.mmlab-ntu.com/project/texformer/
  • Paper(Oral): https://arxiv.org/abs/2109.02563
  • Code: None

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting

  • Paper: https://arxiv.org/abs/2109.02974
  • Code: https://github.com/ruiliu-ai/FuseFormer

CTRL-C: Camera calibration TRansformer with Line-Classification

  • Paper: https://arxiv.org/abs/2109.02259
  • Code: https://github.com/jwlee-vcl/CTRL-C

An End-to-End Transformer Model for 3D Object Detection

  • Homepage: https://facebookresearch.github.io/3detr/
  • Paper: https://arxiv.org/abs/2109.08141
  • Code: https://github.com/facebookresearch/3detr

Eformer: Edge Enhancement based Transformer for Medical Image Denoising

  • Paper: https://arxiv.org/abs/2109.08044
  • Code: None

PnP-DETR: Towards Efficient Visual Analysis with Transformers

  • Paper: https://arxiv.org/abs/2109.07036
  • Code: https://github.com/twangnh/pnp-detr

Transformer-based Dual Relation Graph for Multi-label Image Recognition

  • Paper: https://arxiv.org/abs/2110.04722
  • Code: None

涨点神器

FaPN: Feature-aligned Pyramid Network for Dense Image Prediction

  • Paper: https://github.com/EMI-Group/FaPN
  • Code: https://arxiv.org/abs/2108.07058

Unifying Nonlocal Blocks for Neural Networks

  • Paper: https://arxiv.org/abs/2108.02451
  • Code: https://github.com/zh460045050/SNL_ICCV2021

Towards Learning Spatially Discriminative Feature Representations

  • Paper: https://arxiv.org/abs/2109.01359
  • Code: None

GAN

Labels4Free: Unsupervised Segmentation using StyleGAN

  • Homepage: https://rameenabdal.github.io/Labels4Free/
  • Paper: https://arxiv.org/abs/2103.14968

GNeRF: GAN-based Neural Radiance Field without Posed Camera

  • Paper(Oral): https://arxiv.org/abs/2103.15606

  • Code: https://github.com/MQ66/gnerf

EigenGAN: Layer-Wise Eigen-Learning for GANs

  • Paper: https://arxiv.org/abs/2104.12476
  • Code: https://github.com/LynnHo/EigenGAN-Tensorflow

From Continuity to Editability: Inverting GANs with Consecutive Images

  • Paper: https://arxiv.org/abs/2107.13812
  • Code: https://github.com/Qingyang-Xu/InvertingGANs_with_ConsecutiveImgs

Sketch Your Own GAN

  • Homepage: https://peterwang512.github.io/GANSketching/
  • Paper: https://arxiv.org/abs/2108.02774
  • 代码: https://github.com/peterwang512/GANSketching

Manifold Matching via Deep Metric Learning for Generative Modeling

  • Paper: https://arxiv.org/abs/2106.10777
  • Code: https://github.com/dzld00/pytorch-manifold-matching

Dual Projection Generative Adversarial Networks for Conditional Image Generation

  • Paper: https://arxiv.org/abs/2108.09016
  • Code: None

GAN Inversion for Out-of-Range Images with Geometric Transformations

  • Paper: https://arxiv.org/abs/2108.08998
  • Code: None

ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement

  • Homepage: https://yuval-alaluf.github.io/restyle-encoder/
  • Paper: https://arxiv.org/abs/2104.02699
  • Code: https://github.com/yuval-alaluf/restyle-encoder

StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery

  • Paper(Oral): https://arxiv.org/abs/2103.17249
  • Code: https://github.com/orpatashnik/StyleCLIP

Image Synthesis via Semantic Composition

  • Homepage: https://shepnerd.github.io/scg/
  • Paper: https://arxiv.org/abs/2109.07053
  • Code: https://github.com/dvlab-research/SCGAN

NAS

AutoFormer: Searching Transformers for Visual Recognition

  • Paper: https://arxiv.org/abs/2107.00651
  • Code: https://github.com/microsoft/AutoML

BN-NAS: Neural Architecture Search with Batch Normalization

  • Paper: https://arxiv.org/abs/2108.07375
  • Code: https://github.com/bychen515/BNNAS

Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition

  • Paper: https://arxiv.org/abs/2102.01063
  • Code: https://github.com/idstcv/ZenNAS

NeRF

GNeRF: GAN-based Neural Radiance Field without Posed Camera

  • Paper(Oral): https://arxiv.org/abs/2103.15606

  • Code: https://github.com/MQ66/gnerf

KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs

  • Paper: https://arxiv.org/abs/2103.13744

  • Code: https://github.com/creiser/kilonerf

In-Place Scene Labelling and Understanding with Implicit Scene Representation

  • Homepage: https://shuaifengzhi.com/Semantic-NeRF/
  • Paper(Oral): https://arxiv.org/abs/2103.15875

Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis

  • Homepage: https://ajayj.com/dietnerf
  • Paper(DietNeRF): https://arxiv.org/abs/2104.00677

BARF: Bundle-Adjusting Neural Radiance Fields

  • Homepage: https://chenhsuanlin.bitbucket.io/bundle-adjusting-NeRF/
  • Paper(Oral): https://arxiv.org/abs/2104.06405
  • Code: https://github.com/chenhsuanlin/bundle-adjusting-NeRF

Self-Calibrating Neural Radiance Fields

  • Paper: https://arxiv.org/abs/2108.13826
  • Code: https://github.com/POSTECH-CVLab/SCNeRF

Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction

  • Paper: https://arxiv.org/abs/2109.00512
  • Code: https://github.com/facebookresearch/co3d
  • Dataset: https://github.com/facebookresearch/co3d

Neural Articulated Radiance Field

  • Paper: https://arxiv.org/abs/2104.03110
  • Code: https://github.com/nogu-atsu/NARF

NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo

  • Paper(Oral): https://arxiv.org/abs/2109.01129
  • Code: https://github.com/weiyithu/NerfingMVS

SNARF: Differentiable Forward Skinning for Animating Non-rigid Neural Implicit Shapes

  • Homepage: https://xuchen-ethz.github.io/snarf
  • Paper: https://arxiv.org/abs/2104.03953
  • Code: https://github.com/xuchen-ethz/snarf

CodeNeRF: Disentangled Neural Radiance Fields for Object Categories

  • Paper: https://arxiv.org/abs/2109.01750
  • Code: https://github.com/wayne1123/code-nerf

PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering

  • Paper: https://openaccess.thecvf.com/content/ICCV2021/html/Ren_PIRenderer_Controllable_Portrait_Image_Generation_via_Semantic_Neural_Rendering_ICCV_2021_paper.html
  • Code: https://github.com/RenYurui/PIRender

Loss

Rank & Sort Loss for Object Detection and Instance Segmentation

  • Paper(Oral): https://arxiv.org/abs/2107.11669
  • Code: https://github.com/kemaloksuz/RankSortLoss

Bias Loss for Mobile Neural Networks

  • Paper: https://arxiv.org/abs/2107.11170
  • Code: None

A Robust Loss for Point Cloud Registration

  • Paper: https://arxiv.org/abs/2108.11682
  • Code: None

Reconcile Prediction Consistency for Balanced Object Detection

  • Paper: https://arxiv.org/abs/2108.10809
  • Code: None

Influence-Balanced Loss for Imbalanced Visual Classification

  • Paper: https://arxiv.org/abs/2110.02444
  • Code: https://github.com/pseulki/IB-Loss

Zero-Shot Learning

FREE: Feature Refinement for Generalized Zero-Shot Learning

  • Paper: https://arxiv.org/abs/2107.13807
  • Code: https://github.com/shiming-chen/FREE

Discriminative Region-based Multi-Label Zero-Shot Learning

  • Paper: https://arxiv.org/abs/2108.09301
  • Code: https://arxiv.org/abs/2108.09301

Semantics Disentangling for Generalized Zero-Shot Learning

  • Paper: https://arxiv.org/pdf/2101.07978
  • Code: https://github.com/uqzhichen/SDGZSL

Few-Shot Learning

Relational Embedding for Few-Shot Classification

  • Paper: https://arxiv.org/abs/2108.0966
  • Code: https://github.com/dahyun-kang/renet

Few-Shot and Continual Learning with Attentive Independent Mechanisms

  • Paper: https://arxiv.org/abs/2107.14053
  • Code: https://github.com/huang50213/AIM-Fewshot-Continual

Few Shot Visual Relationship Co-Localization

  • Homepage: https://vl2g.github.io/projects/vrc/

  • Paper: https://arxiv.org/abs/2108.11618

长尾(Long-tailed)

Parametric Contrastive Learning

  • Paper: https://arxiv.org/abs/2107.12028
  • Code: https://github.com/jiequancui/Parametric-Contrastive-Learning

Influence-Balanced Loss for Imbalanced Visual Classification

  • Paper: https://arxiv.org/abs/2110.02444
  • Code: https://github.com/pseulki/IB-Loss

Vision and Language

VLGrammar: Grounded Grammar Induction of Vision and Language

  • Paper: https://arxiv.org/abs/2103.12975
  • Code: https://github.com/evelinehong/VLGrammar

无监督/自监督(Un/Self-Supervised)

An Empirical Study of Training Self-Supervised Vision Transformers

  • Paper(Oral): https://arxiv.org/abs/2104.02057
  • MoCo v3 Code: None

DetCo: Unsupervised Contrastive Learning for Object Detection

  • Paper: https://arxiv.org/abs/2102.04803
  • Code: https://github.com/xieenze/DetCo

Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization

  • Paper: https://arxiv.org/abs/2108.02183
  • Code: None

Improving Contrastive Learning by Visualizing Feature Transformation

  • Paper(Oral): https://arxiv.org/abs/2108.02982
  • Code: https://github.com/DTennant/CL-Visualizing-Feature-Transformation

Self-Supervised Visual Representations Learning by Contrastive Mask Prediction

  • Paper: https://arxiv.org/abs/2108.08012
  • Code: None

Temporal Knowledge Consistency for Unsupervised Visual Representation Learning

  • Paper: https://arxiv.org/abs/2108.10668
  • Code: None

MultiSiam: Self-supervised Multi-instance Siamese Representation Learning for Autonomous Driving

  • Paper: https://arxiv.org/abs/2108.12178
  • Code: https://github.com/KaiChen1998/MultiSiam

Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds

  • Homepage: https://siyuanhuang.com/STRL/
  • Paper: https://arxiv.org/abs/2109.00179
  • Code: https://github.com/yichen928/STRL

Self-supervised Product Quantization for Deep Unsupervised Image Retrieval

  • Paper: https://arxiv.org/abs/2109.02244
  • Code: https://github.com/youngkyunJang/SPQ

Self-Supervised Representation Learning from Flow Equivariance

  • Paper: https://arxiv.org/abs/2101.06553
  • Code: None

Multi-Label Image Recognition(多标签图像识别)

Residual Attention: A Simple but Effective Method for Multi-Label Recognition

  • Paper: https://arxiv.org/abs/2108.02456
  • Code: https://github.com/Kevinz-code/CSRA

2D目标检测(Object Detection)

DetCo: Unsupervised Contrastive Learning for Object Detection

  • Paper: https://arxiv.org/abs/2102.04803
  • Code: https://github.com/xieenze/DetCo

Detecting Invisible People

  • Homepage: http://www.cs.cmu.edu/~tkhurana/invisible.htm
  • Code: https://arxiv.org/abs/2012.08419

Active Learning for Deep Object Detection via Probabilistic Modeling

  • Paper: https://arxiv.org/abs/2103.16130
  • Code: None

Conditional Variational Capsule Network for Open Set Recognition

  • Paper: https://arxiv.org/abs/2104.09159
  • Code: https://github.com/guglielmocamporese/cvaecaposr

MDETR : Modulated Detection for End-to-End Multi-Modal Understanding

  • Homepage: https://ashkamath.github.io/mdetr_page/
  • Paper(Oral): https://arxiv.org/abs/2104.12763
  • Code: https://github.com/ashkamath/mdetr

Rank & Sort Loss for Object Detection and Instance Segmentation

  • Paper(Oral): https://arxiv.org/abs/2107.11669
  • Code: https://github.com/kemaloksuz/RankSortLoss

SimROD: A Simple Adaptation Method for Robust Object Detection

  • Paper(Oral): https://arxiv.org/abs/2107.13389
  • Code: None

GraphFPN: Graph Feature Pyramid Network for Object Detection

  • Paper: https://arxiv.org/abs/2108.00580
  • Code: None

Fast Convergence of DETR with Spatially Modulated Co-Attention

  • Paper: https://arxiv.org/abs/2101.07448
  • Code: https://github.com/abc403/SMCA-replication

Conditional DETR for Fast Training Convergence

  • Paper: https://arxiv.org/abs/2108.06152
  • Code: https://github.com/Atten4Vis/ConditionalDETR

TOOD: Task-aligned One-stage Object Detection

  • Paper(Oral): https://arxiv.org/abs/2108.07755
  • Code: https://github.com/fcjian/TOOD

Reconcile Prediction Consistency for Balanced Object Detection

  • Paper: https://arxiv.org/abs/2108.10809

  • Code: None

Mutual Supervision for Dense Object Detection

  • Paper: https://arxiv.org/abs/2109.05986
  • Code: https://github.com/MCG-NJU/MuSu-Detection

PnP-DETR: Towards Efficient Visual Analysis with Transformers

  • Paper: https://arxiv.org/abs/2109.07036
  • Code: https://github.com/twangnh/pnp-detr

Deep Structured Instance Graph for Distilling Object Detectors

  • Paper: https://arxiv.org/abs/2109.12862

  • Code: https://github.com/dvlab-research/Dsig

半监督目标检测

End-to-End Semi-Supervised Object Detection with Soft Teacher

  • Paper: https://arxiv.org/abs/2106.09018
  • Code: None

旋转目标检测

Oriented R-CNN for Object Detection

  • Paper: https://arxiv.org/abs/2108.05699
  • Code: https://github.com/jbwang1997/OBBDetection

Few-Shot目标检测

DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection

  • Paper: https://arxiv.org/abs/2108.09017
  • Code: https://github.com/er-muyue/DeFRCN

语义分割(Semantic Segmentation)

Personalized Image Semantic Segmentation

  • Paper: https://arxiv.org/abs/2107.13978
  • Code: https://github.com/zhangyuygss/PIS
  • Dataset: https://github.com/zhangyuygss/PIS

Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation

  • Paper(Oral): https://arxiv.org/abs/2107.11264
  • Code: None

Enhanced Boundary Learning for Glass-like Object Segmentation

  • Paper: https://arxiv.org/abs/2103.15734
  • Code: https://github.com/hehao13/EBLNet

Self-Regulation for Semantic Segmentation

  • Paper: https://arxiv.org/abs/2108.09702
  • Code: https://github.com/dongzhang89/SR-SS

Mining Contextual Information Beyond Image for Semantic Segmentation

  • Paper: https://arxiv.org/abs/2108.11819
  • Code: https://github.com/CharlesPikachu/mcibi

Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation

  • Paper: https://arxiv.org/abs/2107.11264
  • Code: https://github.com/shjung13/Standardized-max-logits

ISNet: Integrate Image-Level and Semantic-Level Context for Semantic Segmentation

  • Paper: https://arxiv.org/abs/2108.12382
  • Code: https://github.com/SegmentationBLWX/sssegmentation

Scaling up instance annotation via label propagation

  • Homepage: http://scaling-anno.csail.mit.edu/
  • Paper: https://arxiv.org/abs/2110.02277
  • Code: None

无监督域自适应语义分割(Unsupervised Domain Ddaption Semantic Segmentation)

Multi-Anchor Active Domain Adaptation for Semantic Segmentation

  • Paper(Oral): https://arxiv.org/abs/2108.08012
  • Code: https://github.com/munanning/MADA

Generalize then Adapt: Source-Free Domain Adaptive Semantic Segmentation

  • Homepage: https://sites.google.com/view/sfdaseg
  • Paper: https://arxiv.org/abs/2108.11249

Few-Shot语义分割

Learning Meta-class Memory for Few-Shot Semantic Segmentation

  • Paper: https://arxiv.org/abs/2108.02958'
  • Code: None

Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer

  • Paper: https://arxiv.org/abs/2108.03032
  • Code: https://github.com/zhiheLu/CWT-for-FSS

半监督语义分割(Semi-supervised Semantic Segmentation)

Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation

  • Paper: https://arxiv.org/abs/2107.11787
  • Code: None

Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation

  • Paper(Oral): https://arxiv.org/abs/2107.11279
  • Code: https://github.com/CVMI-Lab/DARS

Pixel Contrastive-Consistent Semi-Supervised Semantic Segmentation

  • Paper: https://arxiv.org/abs/2108.09025
  • Code: None

弱监督语义分割(Weakly Supervised Semantic Segmentation)

Complementary Patch for Weakly Supervised Semantic Segmentation

  • Paper: https://arxiv.org/abs/2108.03852
  • Code: None

无监督分割(Unsupervised Segmentation)

Labels4Free: Unsupervised Segmentation using StyleGAN

  • Homepage: https://rameenabdal.github.io/Labels4Free/
  • Paper: https://arxiv.org/abs/2103.14968

实例分割(Instance Segmentation)

Instances as Queries

  • Paper: https://arxiv.org/abs/2105.01928
  • Code: https://github.com/hustvl/QueryInst

Crossover Learning for Fast Online Video Instance Segmentation

  • Paper: https://arxiv.org/abs/2104.05970
  • Code: https://github.com/hustvl/CrossVIS

Rank & Sort Loss for Object Detection and Instance Segmentation

  • Paper(Oral): https://arxiv.org/abs/2107.11669
  • Code: https://github.com/kemaloksuz/RankSortLoss

SOTR: Segmenting Objects with Transformers

  • Paper: https://arxiv.org/abs/2108.06747
  • Code: https://github.com/easton-cau/SOTR

Scaling up instance annotation via label propagation

  • Homepage: http://scaling-anno.csail.mit.edu/
  • Paper: https://arxiv.org/abs/2110.02277
  • Code: None

医学图像分割(Medical Image Segmentation)

Recurrent Mask Refinement for Few-Shot Medical Image Segmentation

  • Paper: https://arxiv.org/abs/2108.00622
  • Code: https://github.com/uci-cbcl/RP-Net

视频目标分割(Video Object Segmentation)

Hierarchical Memory Matching Network for Video Object Segmentation

  • Paper: https://arxiv.org/abs/2109.11404
  • Code: https://github.com/Hongje/HMMN

Full-Duplex Strategy for Video Object Segmentation

  • Homepage: http://dpfan.net/FSNet/
  • Paper: https://arxiv.org/abs/2108.03151
  • Code: https://github.com/GewelsJI/FSNet

Joint Inductive and Transductive Learning for Video Object Segmentation

  • Paper: https://arxiv.org/abs/2108.03679
  • Code: https://github.com/maoyunyao/JOINT

Few-shot Segmentation

Mining Latent Classes for Few-shot Segmentation

  • Paper(Oral): https://arxiv.org/abs/2103.15402
  • Code: https://github.com/LiheYoung/MiningFSS

人体运动分割(Human Motion Segmentation)

Graph Constrained Data Representation Learning for Human Motion Segmentation

  • Paper: https://arxiv.org/abs/2107.13362
  • Code: None

目标跟踪(Object Tracking)

Learning to Track Objects from Unlabeled Videos

  • Paper: https://arxiv.org/abs/2108.12711
  • Code: https://github.com/VISION-SJTU/USOT

Learning Spatio-Temporal Transformer for Visual Tracking

  • Paper: https://arxiv.org/abs/2103.17154
  • Code: https://github.com/researchmm/Stark

Learning to Adversarially Blur Visual Object Tracking

  • Paper: https://arxiv.org/abs/2107.12085
  • Code: https://github.com/tsingqguo/ABA

HiFT: Hierarchical Feature Transformer for Aerial Tracking

  • Paper: https://arxiv.org/abs/2108.00202
  • Code: https://github.com/vision4robotics/HiFT

Learn to Match: Automatic Matching Network Design for Visual Tracking

  • Paper: https://arxiv.org/abs/2108.00803
  • Code: https://github.com/JudasDie/SOTS

Saliency-Associated Object Tracking

  • Paper: https://arxiv.org/abs/2108.03637
  • Code: https://github.com/ZikunZhou/SAOT.git

RGBD 目标跟踪

DepthTrack: Unveiling the Power of RGBD Tracking

  • Paper: https://arxiv.org/abs/2108.13962
  • Code: https://github.com/xiaozai/DeT
  • Dataset: https://github.com/xiaozai/DeT

3D Point Cloud

Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds

  • Homepage: https://siyuanhuang.com/STRL/

  • Paper: https://arxiv.org/abs/2109.00179

  • Code: https://github.com/yichen928/STRL

Unsupervised Point Cloud Pre-Training via View-Point Occlusion, Completion

  • Homepage: https://hansen7.github.io/OcCo/
  • Paper: https://arxiv.org/abs/2010.01089
  • Code: https://github.com/hansen7/OcCo

DRINet: A Dual-Representation Iterative Learning Network for Point Cloud Segmentation

  • Paper: https://arxiv.org/abs/2108.04023
  • Code: None

Adaptive Graph Convolution for Point Cloud Analysis

  • Paper: https://arxiv.org/abs/2108.08035
  • Code: https://github.com/hrzhou2/AdaptConv-master

Unsupervised Point Cloud Pre-Training via View-Point Occlusion, Completion

  • Paper: https://arxiv.org/abs/2010.01089
  • Code: https://github.com/hansen7/OcCo

3D Object Detection(3D目标检测)

Group-Free 3D Object Detection via Transformers

  • Paper: https://arxiv.org/abs/2104.00678
  • Code: None

Improving 3D Object Detection with Channel-wise Transformer

  • Paper: https://arxiv.org/abs/2108.10723
  • Code: https://github.com/hlsheng1/CT3D

AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection

  • Paper: https://arxiv.org/abs/2108.11127
  • Code: https://github.com/zongdai/AutoShape

4D-Net for Learned Multi-Modal Alignment

  • Paper: https://arxiv.org/abs/2109.01066
  • Code: None

Voxel Transformer for 3D Object Detection

  • Paper: https://arxiv.org/abs/2109.02497
  • Code: None

Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection

  • Paper: https://arxiv.org/abs/2109.02499
  • Code: None

An End-to-End Transformer Model for 3D Object Detection

  • Homepage: https://facebookresearch.github.io/3detr/
  • Paper: https://arxiv.org/abs/2109.08141
  • Code: https://github.com/facebookresearch/3detr

RangeDet: In Defense of Range View for LiDAR-based 3D Object Detection

  • Paper: https://arxiv.org/abs/2103.10039
  • Code: https://github.com/TuSimple/RangeDet

Geometry-based Distance Decomposition for Monocular 3D Object Detection

  • Paper: https://arxiv.org/abs/2104.03775
  • Code: https://github.com/Rock-100/MonoDet

3D Semantic Segmentation(3D语义分割)

ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation

  • Paper: https://arxiv.org/abs/2107.11769
  • Code: None

Learning with Noisy Labels for Robust Point Cloud Segmentation

  • Homepage: https://shuquanye.com/PNAL_website/
  • Paper(Oral): https://arxiv.org/abs/2107.14230

VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation

  • Paper(Oral): https://arxiv.org/abs/2107.13824
  • Code: https://github.com/hzykent/VMNet

Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal Learning in Domain Adaptation for 3D Semantic Segmentation

  • Paper: https://arxiv.org/abs/2107.14724
  • Code: https://github.com/leolyj/DsCML

DRINet: A Dual-Representation Iterative Learning Network for Point Cloud Segmentation

  • Paper: https://arxiv.org/abs/2108.04023
  • Code: None

Adaptive Graph Convolution for Point Cloud Analysis

  • Paper: https://arxiv.org/abs/2108.08035
  • Code: https://github.com/hrzhou2/AdaptConv-master

Perception-Aware Multi-Sensor Fusion for 3D LiDAR Semantic Segmentation

  • Paper: https://arxiv.org/abs/2106.15277

  • Code: https://github.com/ICEORY/PMF

3D Instance Segmentation(3D实例分割)

Hierarchical Aggregation for 3D Instance Segmentation

  • Paper: https://arxiv.org/abs/2108.02350
  • Code: https://github.com/hustvl/HAIS

Instance Segmentation in 3D Scenes Using Semantic Superpoint Tree Networks

  • Paper: https://openaccess.thecvf.com/content/ICCV2021/html/Liang_Instance_Segmentation_in_3D_Scenes_Using_Semantic_Superpoint_Tree_Networks_ICCV_2021_paper.html

  • Code: https://github.com/Gorilla-Lab-SCUT/SSTNet

3D Multi-Object Tracking(3D多目标跟踪)

Exploring Simple 3D Multi-Object Tracking for Autonomous Driving

  • Paper: https://arxiv.org/abs/2108.10312
  • Code: https://github.com/qcraftai/simtrack

Point Cloud Denoising(点云去噪)

Score-Based Point Cloud Denoising

  • Paper: https://arxiv.org/abs/2107.10981
  • Code: None

Point Cloud Registration(点云配准)

HRegNet: A Hierarchical Network for Large-scale Outdoor LiDAR Point Cloud Registration

  • Homepage: https://ispc-group.github.io/hregnet
  • Paper: https://arxiv.org/abs/2107.11992
  • Code: https://github.com/ispc-lab/HRegNet

A Robust Loss for Point Cloud Registration

  • Paper: https://arxiv.org/abs/2108.11682
  • Code: None

Point Cloud Completion(点云补全)

PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers

  • Paper(Oral): https://arxiv.org/abs/2108.08839
  • Code: https://github.com/yuxumin/PoinTr

SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer

  • Paper: https://arxiv.org/abs/2108.04444
  • Code: https://github.com/AllenXiangX/SnowflakeNet

雷达语义分割(Radar Semantic Segmentation)

Multi-View Radar Semantic Segmentation

  • Paper: https://arxiv.org/abs/2103.16214
  • Code: https://github.com/valeoai/MVRSS

图像恢复(Image Restoration)

Dynamic Attentive Graph Learning for Image Restoration

  • Paper: https://arxiv.org/abs/2109.06620
  • Code: https://github.com/jianzhangcs/DAGL

超分辨率(Super-Resolution)

Learning for Scale-Arbitrary Super-Resolution from Scale-Specific Networks

  • Paper: https://arxiv.org/abs/2004.03791
  • Code: https://github.com/LongguangWang/ArbSR

Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution

  • Paper: https://arxiv.org/abs/2108.05302
  • Code: https://github.com/JingyunLiang/MANet

Deep Reparametrization of Multi-Frame Super-Resolution and Denoising

  • Paper(Oral): https://arxiv.org/abs/2108.08286
  • Code: None

Dual-Camera Super-Resolution with Aligned Attention Modules

  • Homepage: https://tengfei-wang.github.io/Dual-Camera-SR/index.html
  • Paper: https://arxiv.org/abs/2109.01349
  • Code: https://github.com/Tengfei-Wang/DualCameraSR
  • Dataset: https://tengfei-wang.github.io/Dual-Camera-SR/index.html

Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme

  • Paper: https://www4.comp.polyu.edu.hk/~cslzhang/paper/ICCV21_RealVSR.pdf
  • Code: https://github.com/IanYeung/RealVSR
  • Dataset: https://github.com/IanYeung/RealVSR

去噪(Denoising)

Deep Reparametrization of Multi-Frame Super-Resolution and Denoising

  • Paper(Oral): https://arxiv.org/abs/2108.08286
  • Code: None

Rethinking Deep Image Prior for Denoising

  • Paper: https://arxiv.org/abs/2108.12841
  • Code: https://github.com/gistvision/DIP-denosing

医学图像去噪(Medical Image Denoising)

Eformer: Edge Enhancement based Transformer for Medical Image Denoising

  • Paper: https://arxiv.org/abs/2109.08044
  • Code: None

去模糊(Deblurring)

Rethinking Coarse-to-Fine Approach in Single Image Deblurring

  • Paper: https://arxiv.org/abs/2108.05054
  • Code: https://github.com/chosj95/MIMO-UNet

Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions

  • Paper: https://arxiv.org/abs/2108.09108
  • Code: None

阴影去除(Shadow Removal)

CANet: A Context-Aware Network for Shadow Removal

  • Paper: https://arxiv.org/abs/2108.09894
  • Code: https://github.com/Zipei-Chen/CANet

视频插帧(Video Frame Interpolation)

XVFI: eXtreme Video Frame Interpolation

  • Paper(Oral): https://arxiv.org/abs/2103.16206
  • Code: https://github.com/JihyongOh/XVFI
  • Dataset: https://github.com/JihyongOh/XVFI

Asymmetric Bilateral Motion Estimation for Video Frame Interpolation

  • Paper: https://arxiv.org/abs/2108.06815
  • Code: https://github.com/JunHeum/ABME

视频修复/补全(Video Inpainting)

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting

  • Paper: https://arxiv.org/abs/2109.02974
  • Code: https://github.com/ruiliu-ai/FuseFormer

行人重识别(Person Re-identification)

TransReID: Transformer-based Object Re-Identification

  • Paper: https://arxiv.org/abs/2102.04378

  • Code: https://github.com/heshuting555/TransReID

IDM: An Intermediate Domain Module for Domain Adaptive Person Re-ID

  • Paper(Oral): https://arxiv.org/abs/2108.02413
  • Code: https://github.com/SikaStar/IDM

行人搜索(Person Search)

Weakly Supervised Person Search with Region Siamese Networks

  • Paper: https://arxiv.org/abs/2109.06109
  • Code: None

2D/3D人体姿态估计(2D/3D Human Pose Estimation)

2D 人体姿态估计

Human Pose Regression with Residual Log-likelihood Estimation

  • Paper(Oral): https://arxiv.org/abs/2107.11291
  • Code(RLE): https://github.com/Jeff-sjtu/res-loglikelihood-regression

Online Knowledge Distillation for Efficient Pose Estimation

  • Paper: https://arxiv.org/abs/2108.02092
  • Code: None

3D 人体姿态估计

Probabilistic Monocular 3D Human Pose Estimation with Normalizing Flows

  • Paper: https://arxiv.org/abs/2107.13788
  • Code: https://github.com/twehrbein/Probabilistic-Monocular-3D-Human-Pose-Estimation-with-Normalizing-Flows

Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images

  • Paper: https://arxiv.org/abs/2109.05885
  • Code: None

6D位姿估计(6D Object Pose Estimation)

StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose Estimation

  • Paper: https://arxiv.org/abs/2109.10115
  • Code: None
  • Dataset: None

3D人头重建(3D Head Reconstruction)

H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction

  • Homepage: https://crisalixsa.github.io/h3d-net/

  • Paper: https://arxiv.org/abs/2107.12512

人脸识别(Face Recognition)

SynFace: Face Recognition with Synthetic Data

  • Paper: https://arxiv.org/abs/2108.07960
  • Code: None

Facial Expression Recognition(人脸表情识别)

TransFER: Learning Relation-aware Facial Expression Representations with Transformers

  • Paper: https://arxiv.org/abs/2108.11116
  • Code: None

行为识别(Action Recognition)

MGSampler: An Explainable Sampling Strategy for Video Action Recognition

  • Paper: https://arxiv.org/abs/2104.09952
  • Code: None

Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition

  • Paper: https://arxiv.org/abs/2107.12213
  • Code: https://github.com/Uason-Chen/CTR-GCN

Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization

  • Paper: https://arxiv.org/abs/2108.02183
  • Code: None

Dynamic Network Quantization for Efficient Video Inference

  • Homepage: https://cs-people.bu.edu/sunxm/VideoIQ/project.html
  • Paper: https://arxiv.org/abs/2108.10394
  • Code: https://github.com/sunxm2357/VideoIQ

时序动作定位(Temporal Action Localization)

Enriching Local and Global Contexts for Temporal Action Localization

  • Paper: https://arxiv.org/abs/2107.12960
  • Code: None

动作检测(Action Detection)

Class Semantics-based Attention for Action Detection

  • Paper: https://arxiv.org/abs/2109.02613
  • Code: None

群体活动识别(Group Activity Recognition)

GroupFormer: Group Activity Recognition with Clustered Spatial-Temporal Transformer

  • Paper: https://arxiv.org/abs/2108.12630
  • Code: https://github.com/xueyee/GroupFormer

手语识别(Sign Language Recognition)

Visual Alignment Constraint for Continuous Sign Language Recognition

  • Paper: https://arxiv.org/abs/2104.02330
  • Code: https://github.com/ycmin95/VAC_CSLR

文本检测(Text Detection)

Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection

  • Paper: https://arxiv.org/abs/2107.12664
  • Code: https://github.com/GXYM/TextBPN

文本识别(Text Recognition)

Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition

  • Paper: https://arxiv.org/abs/2107.12090
  • Code: None

文本替换(Text Replacement)

STRIVE: Scene Text Replacement In Videos

  • Homepage: https://striveiccv2021.github.io/STRIVE-ICCV2021/

  • Paper: https://arxiv.org/abs/2109.02762

  • Code: https://github.com/striveiccv2021/STRIVE-ICCV2021/

  • Datasets: https://github.com/striveiccv2021/STRIVE-ICCV2021/

视觉问答(Visual Question Answering, VQA)

Greedy Gradient Ensemble for Robust Visual Question Answering

  • Paper: https://arxiv.org/abs/2107.12651

  • Code: https://github.com/GeraldHan/GGE

对抗攻击(Adversarial Attack)

Feature Importance-aware Transferable Adversarial Attacks

  • Paper: https://arxiv.org/abs/2107.14185
  • Code: https://github.com/hcguoO0/FIA

AdvDrop: Adversarial Attack to DNNs by Dropping Information

  • Paper: https://arxiv.org/abs/2108.09034
  • Code: https://github.com/RjDuan/AdvDrop

深度估计(Depth Estimation)

Augmenting Depth Estimation with Geospatial Context

  • Paper: https://arxiv.org/abs/2109.09879
  • Code: None

NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo

  • Paper(Oral): https://arxiv.org/abs/2109.01129
  • Code: https://github.com/weiyithu/NerfingMVS

单目深度估计

MonoIndoor: Towards Good Practice of Self-Supervised Monocular Depth Estimation for Indoor Environments

  • Paper: https://arxiv.org/abs/2107.12429
  • Code: None

Towards Interpretable Deep Networks for Monocular Depth Estimation

  • Paper: https://arxiv.org/abs/2108.05312
  • Code: https://github.com/youzunzhi/InterpretableMDE

Regularizing Nighttime Weirdness: Efficient Self-supervised Monocular Depth Estimation in the Dark

  • Paper: https://arxiv.org/abs/2108.03830
  • Code: https://github.com/w2kun/RNW

Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation

  • Paper: https://arxiv.org/abs/2108.07628
  • Code: https://github.com/LINA-lln/ADDS-DepthNet

StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation

  • Paper: https://arxiv.org/abs/2108.08574
  • Code: https://github.com/SJTU-ViSYS/StructDepth

视线估计(Gaze Estimation)

Generalizing Gaze Estimation with Outlier-guided Collaborative Adaptation

  • Paper: https://arxiv.org/abs/2107.13780
  • Code: https://github.com/DreamtaleCore/PnP-GA

人群计数(Crowd Counting)

Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework

  • Paper(Oral): https://arxiv.org/abs/2107.12746
  • Code(P2PNet): https://github.com/TencentYoutuResearch/CrowdCounting-P2PNet

Uniformity in Heterogeneity:Diving Deep into Count Interval Partition for Crowd Counting

  • Paper: https://arxiv.org/abs/2107.12619
  • Code: https://github.com/TencentYoutuResearch/CrowdCounting-UEPNet

车道线检测(Lane-Detection)

VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection

  • Paper: https://arxiv.org/abs/2108.08482

  • Code: https://github.com/yujun0-0/MMA-Net

  • Dataset: https://github.com/yujun0-0/MMA-Net

轨迹预测(Trajectory Prediction)

Human Trajectory Prediction via Counterfactual Analysis

  • Paper: https://arxiv.org/abs/2107.14202
  • Code: https://github.com/CHENGY12/CausalHTP

Personalized Trajectory Prediction via Distribution Discrimination

  • Paper: https://arxiv.org/abs/2107.14204
  • Code: https://github.com/CHENGY12/DisDis

MG-GAN: A Multi-Generator Model Preventing Out-of-Distribution Samples in Pedestrian Trajectory Prediction

  • Paper: https://arxiv.org/abs/2108.09274
  • Code: https://github.com/selflein/MG-GAN

Social NCE: Contrastive Learning of Socially-aware Motion Representations

  • Paper: https://arxiv.org/abs/2012.11717
  • Code: https://github.com/vita-epfl/social-nce

Safety-aware Motion Prediction with Unseen Vehicles for Autonomous Driving

  • Paper: https://arxiv.org/abs/2109.01510
  • Code: https://github.com/xrenaa/Safety-Aware-Motion-Prediction

Where are you heading? Dynamic Trajectory Prediction with Expert Goal Examples

  • Paper: https://openaccess.thecvf.com/content/ICCV2021/papers/Zhao_Where_Are_You_Heading_Dynamic_Trajectory_Prediction_With_Expert_Goal_ICCV_2021_paper.pdf
  • Code: https://github.com/JoeHEZHAO/expert_traj

异常检测(Anomaly Detection)

Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning

  • Paper: https://arxiv.org/abs/2101.10030
  • Code: https://github.com/tianyu0207/RTFM

场景图生成(Scene Graph Generation)

Spatial-Temporal Transformer for Dynamic Scene Graph Generation

  • Paper: https://arxiv.org/abs/2107.12309
  • Code: None

图像编辑(Image Editing)

Sketch Your Own GAN

  • Homepage: https://peterwang512.github.io/GANSketching/
  • Paper: https://arxiv.org/abs/2108.02774
  • 代码: https://github.com/peterwang512/GANSketching

图像合成(Image Synthesis)

Image Synthesis via Semantic Composition

  • Homepage: https://shepnerd.github.io/scg/
  • Paper: https://arxiv.org/abs/2109.07053
  • Code: https://github.com/dvlab-research/SCGAN

图像检索(Image Retrieval)

Self-supervised Product Quantization for Deep Unsupervised Image Retrieval

  • Paper: https://arxiv.org/abs/2109.02244
  • Code: https://github.com/youngkyunJang/SPQ

三维重建(3D Reconstruction)

Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction

  • Paper: https://arxiv.org/abs/2109.00512

  • Code: https://github.com/facebookresearch/co3d

  • Dataset: https://github.com/facebookresearch/co3d

视频稳像(Video Stabilization)

Out-of-boundary View Synthesis Towards Full-Frame Video Stabilization

  • Paper: https://arxiv.org/abs/2108.09041

  • 代码:https://github.com/Annbless/OVS_Stabilization

细粒度识别(Fine-Grained Recognition)

Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach

  • Paper: https://arxiv.org/abs/2108.02399
  • Code: https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset
  • Dataset: https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset

风格迁移(Style Transfer)

AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer

  • Paper: https://arxiv.org/abs/2108.03647

  • Paddle Code:https://github.com/PaddlePaddle/PaddleGAN

  • PyTorch Code:https://github.com/Huage001/AdaAttN

神经绘画(Neural Painting)

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction

  • Paper: https://arxiv.org/abs/2108.03798
  • Code: https://github.com/wzmsltw/PaintTransformer

特征匹配(Feature Matching)

Learning to Match Features with Seeded Graph Matching Network

  • Paper: https://arxiv.org/abs/2108.08771

  • Code: https://github.com/vdvchen/SGMNet

语义对应(Semantic Correspondence)

Multi-scale Matching Networks for Semantic Correspondence

  • Paper: https://arxiv.org/abs/2108.00211
  • Code: https://github.com/wintersun661/MMNet

边缘检测(Edge Detection)

Pixel Difference Networks for Efficient Edge Detection

  • Paper: https://arxiv.org/abs/2108.07009
  • Code: https://github.com/zhuoinoulu/pidinet

RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth

  • Paper: https://arxiv.org/abs/2108.00616
  • Code : https://github.com/MengyangPu/RINDNet
  • Dataset: https://github.com/MengyangPu/RINDNet

相机标定(Camera calibration)

CTRL-C: Camera calibration TRansformer with Line-Classification

  • Paper: https://arxiv.org/abs/2109.02259
  • Code: https://github.com/jwlee-vcl/CTRL-C

图像质量评估(Image Quality Assessment)

MUSIQ: Multi-scale Image Quality Transformer

  • Paper: https://arxiv.org/abs/2108.05997
  • Code: https://github.com/google-research/google-research/tree/master/musiq

Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment

  • Paper: https://arxiv.org/abs/2108.07948
  • Code: https://github.com/researchmm/CKDN

度量学习(Metric Learning)

Deep Relational Metric Learning

  • Paper: https://arxiv.org/abs/2108.10026
  • Code: https://github.com/zbr17/DRML

Towards Interpretable Deep Metric Learning with Structural Matching

  • Paper: https://arxiv.org/abs/2108.05889
  • Code: https://github.com/wl-zhao/DIML

Unsupervised Domain Adaptation

Recursively Conditional Gaussian for Ordinal Unsupervised Domain Adaptation

  • Paper(Oral): https://arxiv.org/abs/2107.13467
  • Code: None

Video Rescaling

Self-Conditioned Probabilistic Learning of Video Rescaling

  • Paper: https://arxiv.org/abs/2107.11639

  • Code: None

Hand-Object Interaction

Learning a Contact Potential Field to Model the Hand-Object Interaction

  • Paper: https://arxiv.org/abs/2012.00924
  • Code: https://lixiny.github.io/CPF

Vision-and-Language Navigation

Airbert: In-domain Pretraining for Vision-and-Language Navigation

  • Paper: https://arxiv.org/abs/2108.09105
  • Code: https://airbert-vln.github.io/
  • Dataset: https://airbert-vln.github.io/

数据集(Datasets)

Beyond Road Extraction: A Dataset for Map Update using Aerial Images

  • Homepage: https://favyen.com/muno21/

  • Paper: https://arxiv.org/abs/2110.04690

  • Code: https://github.com/favyen/muno21

StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose Estimation

  • Paper: https://arxiv.org/abs/2109.10115
  • Code: None
  • Dataset: None

RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth

  • Paper: https://arxiv.org/abs/2108.00616
  • Code : https://github.com/MengyangPu/RINDNet
  • Dataset: https://github.com/MengyangPu/RINDNet

Panoptic Narrative Grounding

  • Homepage: https://bcv-uniandes.github.io/panoptic-narrative-grounding/
  • Paper(Oral): https://arxiv.org/abs/2109.04988
  • Code: https://github.com/BCV-Uniandes/PNG
  • Dataset: https://github.com/BCV-Uniandes/PNG

STRIVE: Scene Text Replacement In Videos

  • Homepage: https://striveiccv2021.github.io/STRIVE-ICCV2021/

  • Paper: https://arxiv.org/abs/2109.02762

  • Code: https://github.com/striveiccv2021/STRIVE-ICCV2021/

  • Datasets: https://github.com/striveiccv2021/STRIVE-ICCV2021/

Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme

  • Paper: https://www4.comp.polyu.edu.hk/~cslzhang/paper/ICCV21_RealVSR.pdf
  • Code: https://github.com/IanYeung/RealVSR
  • Dataset: https://github.com/IanYeung/RealVSR

Matching in the Dark: A Dataset for Matching Image Pairs of Low-light Scenes

  • Paper: https://arxiv.org/abs/2109.03585

  • Code: None

Dual-Camera Super-Resolution with Aligned Attention Modules

  • Homepage: https://tengfei-wang.github.io/Dual-Camera-SR/index.html
  • Paper: https://arxiv.org/abs/2109.01349
  • Code: https://github.com/Tengfei-Wang/DualCameraSR
  • Dataset: https://tengfei-wang.github.io/Dual-Camera-SR/index.html

DepthTrack: Unveiling the Power of RGBD Tracking

  • Paper: https://arxiv.org/abs/2108.13962
  • Code: https://github.com/xiaozai/DeT
  • Dataset: https://github.com/xiaozai/DeT

Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction

  • Paper: https://arxiv.org/abs/2109.00512

  • Code: https://github.com/facebookresearch/co3d

  • Dataset: https://github.com/facebookresearch/co3d

BioFors: A Large Biomedical Image Forensics Dataset

  • Paper: https://arxiv.org/abs/2108.12961
  • Code: None
  • Dataset: None

Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach

  • Paper: https://arxiv.org/abs/2108.02399
  • Code: https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset
  • Dataset: https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset

Airbert: In-domain Pretraining for Vision-and-Language Navigation

  • Paper: https://arxiv.org/abs/2108.09105
  • Code: https://airbert-vln.github.io/
  • Dataset: https://airbert-vln.github.io/

Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation

  • Paper: http://arxiv.org/abs/2108.08202
  • Code: https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021
  • Dataset: https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021

VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection

  • Paper: https://arxiv.org/abs/2108.08482

  • Code: https://github.com/yujun0-0/MMA-Net

  • Dataset: https://github.com/yujun0-0/MMA-Net

XVFI: eXtreme Video Frame Interpolation

  • Paper(Oral): https://arxiv.org/abs/2103.16206
  • Code: https://github.com/JihyongOh/XVFI
  • Dataset: https://github.com/JihyongOh/XVFI

Personalized Image Semantic Segmentation

  • Paper: https://arxiv.org/abs/2107.13978
  • Code: https://github.com/zhangyuygss/PIS
  • Dataset: https://github.com/zhangyuygss/PIS

H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction

  • Homepage: https://crisalixsa.github.io/h3d-net/

  • Paper: https://arxiv.org/abs/2107.12512

其他(Others)

Photon-Starved Scene Inference using Single Photon Cameras

  • Paper: https://openaccess.thecvf.com/content/ICCV2021/papers/Goyal_Photon-Starved_Scene_Inference_Using_Single_Photon_Cameras_ICCV_2021_paper.pdf

  • Code: https://github.com/bhavyagoyal/spclowlight

Towards Flexible Blind JPEG Artifacts Removal

  • Paper: https://arxiv.org/abs/2109.14573

  • Code: https://github.com/jiaxi-jiang/FBCNN

Generating Attribution Maps with Disentangled Masked Backpropagation

  • Paper: https://arxiv.org/abs/2101.06773
  • Code: https://gitlab.com/adriaruizo/dmbp_iccv21

CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations

  • Paper: https://arxiv.org/abs/2109.14910
  • Code: None

ReconfigISP: Reconfigurable Camera Image Processing Pipeline

  • Paper: https://arxiv.org/abs/2109.04760
  • Code: None

Panoptic Narrative Grounding

  • Homepage: https://bcv-uniandes.github.io/panoptic-narrative-grounding/
  • Paper(Oral): https://arxiv.org/abs/2109.04988
  • Code: https://github.com/BCV-Uniandes/PNG
  • Dataset: https://github.com/BCV-Uniandes/PNG

NEAT: Neural Attention Fields for End-to-End Autonomous Driving

  • Paper: https://arxiv.org/abs/2109.04456
  • https://github.com/autonomousvision/neat

Keep CALM and Improve Visual Feature Attribution

  • Paper: https://arxiv.org/abs/2106.07861
  • Code: https://github.com/naver-ai/calm

YouRefIt: Embodied Reference Understanding with Language and Gesture

  • Paper: https://arxiv.org/abs/2109.03413
  • Code: None

Pri3D: Can 3D Priors Help 2D Representation Learning?

  • Paper: https://arxiv.org/abs/2104.11225
  • Code: https://github.com/Sekunde/Pri3D

Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain

  • Paper: https://arxiv.org/abs/2108.08487
  • Code: https://github.com/iCGY96/APR

Continual Learning for Image-Based Camera Localization

  • Paper: https://arxiv.org/abs/2108.09112
  • Code: None

Multi-Task Self-Training for Learning General Representations

  • Paper: https://arxiv.org/abs/2108.11353
  • Code: None

A Unified Objective for Novel Class Discovery

  • Homepage: https://ncd-uno.github.io/
  • Paper(Oral): https://arxiv.org/abs/2108.08536
  • Code: https://github.com/DonkeyShot21/UNO

Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs

  • Paper: https://arxiv.org/abs/2108.07884
  • Code: https://github.com/islamamirul/PermuteNet

Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation

  • Paper: http://arxiv.org/abs/2108.08202
  • Code: https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021
  • Dataset: https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021

Impact of Aliasing on Generalizatin in Deep Convolutional Networks

  • Paper: https://arxiv.org/abs/2108.03489
  • Code: None

Out-of-Core Surface Reconstruction via Global TGV Minimization

  • Paper: https://arxiv.org/abs/2107.14790
  • Code: None

Progressive Correspondence Pruning by Consensus Learning

  • Homepage: https://sailor-z.github.io/projects/CLNet.html
  • Paper: https://arxiv.org/abs/2101.00591
  • Code: https://github.com/sailor-z/CLNet

Energy-Based Open-World Uncertainty Modeling for Confidence Calibration

  • Paper: https://arxiv.org/abs/2107.12628
  • Code: None

Generalized Shuffled Linear Regression

  • Paper: https://drive.google.com/file/d/1Qu21VK5qhCW8WVjiRnnBjehrYVmQrDNh/view?usp=sharing
  • Code: https://github.com/SILI1994/Generalized-Shuffled-Linear-Regression

Discovering 3D Parts from Image Collections

  • Homepage: https://chhankyao.github.io/lpd/

  • Paper: https://arxiv.org/abs/2107.13629

Semi-Supervised Active Learning with Temporal Output Discrepancy

  • Paper: https://arxiv.org/abs/2107.14153
  • Code: https://github.com/siyuhuang/TOD

Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?

Paper: https://arxiv.org/abs/2105.02498

Code: https://github.com/KingJamesSong/DifferentiableSVD

Hand-Object Contact Consistency Reasoning for Human Grasps Generation

  • Homepage: https://hwjiang1510.github.io/GraspTTA/
  • Paper(Oral): https://arxiv.org/abs/2104.03304
  • Code: None

Equivariant Imaging: Learning Beyond the Range Space

  • Paper(Oral): https://arxiv.org/abs/2103.14756
  • Code: https://github.com/edongdongchen/EI

Just Ask: Learning to Answer Questions from Millions of Narrated Videos

  • Paper(Oral): https://arxiv.org/abs/2012.00451
  • Code: https://github.com/antoyang/just-ask