CVPR2024-Papers-with-Code icon indicating copy to clipboard operation
CVPR2024-Papers-with-Code copied to clipboard

CVPR 2024 论文和开源项目合集

CVPR 2023 论文和开源项目合集(Papers with Code)

CVPR 2023 论文和开源项目合集(papers with code)!

25.78% = 2360 / 9155

CVPR2023 decisions are now available on OpenReview! This year, wereceived a record number of 9155 submissions (a 12% increase over CVPR2022), and accepted 2360 papers, for a 25.78% acceptance rate.

注1:欢迎各位大佬提交issue,分享CVPR 2023论文和开源项目!

注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision

  • CVPR 2019
  • CVPR 2020
  • CVPR 2021
  • CVPR 2022

如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【CVer学术交流群】!互相学习,一起进步~

【CVPR 2023 论文开源目录】

  • Backbone
  • CLIP
  • MAE
  • GAN
  • GNN
  • MLP
  • NAS
  • OCR
  • NeRF
  • DETR
  • Diffusion Models(扩散模型)
  • Avatars
  • 长尾分布(Long-Tail)
  • Vision Transformer
  • 视觉和语言(Vision-Language)
  • 自监督学习(Self-supervised Learning)
  • 数据增强(Data Augmentation)
  • 目标检测(Object Detection)
  • 目标跟踪(Visual Tracking)
  • 语义分割(Semantic Segmentation)
  • 实例分割(Instance Segmentation)
  • 全景分割(Panoptic Segmentation)
  • 图像抠图(Image Matting)
  • 视频理解(Video Understanding)
  • 图像编辑(Image Editing)
  • Low-level Vision
  • 超分辨率(Super-Resolution)
  • 去模糊(Deblur)
  • 3D点云(3D Point Cloud)
  • 3D目标检测(3D Object Detection)
  • 3D语义分割(3D Semantic Segmentation)
  • 3D目标跟踪(3D Object Tracking)
  • 3D人体姿态估计(3D Human Pose Estimation)
  • 3D语义场景补全(3D Semantic Scene Completion)
  • 3D重建(3D Reconstruction)
  • 医学图像(Medical Image)
  • 视频生成(Video Generation)
  • 知识蒸馏(Knowledge Distillation)
  • 轨迹预测Trajectory Prediction)
  • 数据集(Datasets)
  • 新任务(New Tasks)
  • 其他(Others)

Backbone

Integrally Pre-Trained Transformer Pyramid Networks

  • Paper: https://arxiv.org/abs/2211.12735
  • Code: https://github.com/sunsmarterjie/iTPN

Stitchable Neural Networks

  • Homepage: https://snnet.github.io/
  • Paper: https://arxiv.org/abs/2302.06586
  • Code: https://github.com/ziplab/SN-Net

MAE

Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders

  • Paper: https://arxiv.org/abs/2212.06785
  • Code: https://github.com/ZrrSkywalker/I2P-MAE

NeRF

NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior

  • Home: https://nope-nerf.active.vision/
  • Paper: https://arxiv.org/abs/2212.07388
  • Code: None

Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures

  • Paper: https://arxiv.org/abs/2211.07600
  • Code: https://github.com/eladrich/latent-nerf

NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics via Novel-View Synthesis

  • Paper: https://arxiv.org/abs/2301.08556
  • Code: None

DETR

DETRs with Hybrid Matching

  • Paper: https://arxiv.org/abs/2207.13080
  • Code: https://github.com/HDETR

NAS

PA&DA: Jointly Sampling PAth and DAta for Consistent NAS

  • Paper: https://arxiv.org/abs/2302.14772
  • Code: https://github.com/ShunLu91/PA-DA

Diffusion Models(扩散模型)

Video Probabilistic Diffusion Models in Projected Latent Space

  • Homepage: https://sihyun.me/PVDM/
  • Paper: https://arxiv.org/abs/2302.07685
  • Code: https://github.com/sihyun-yu/PVDM

Solving 3D Inverse Problems using Pre-trained 2D Diffusion Models

  • Paper: https://arxiv.org/abs/2211.10655
  • Code: None

Imagic: Text-Based Real Image Editing with Diffusion Models

  • Homepage: https://imagic-editing.github.io/
  • Paper: https://arxiv.org/abs/2210.09276
  • Code: None

Parallel Diffusion Models of Operator and Image for Blind Inverse Problems

  • Paper: https://arxiv.org/abs/2211.10656
  • Code: None

Vision Transformer

Integrally Pre-Trained Transformer Pyramid Networks

  • Paper: https://arxiv.org/abs/2211.12735
  • Code: https://github.com/sunsmarterjie/iTPN

Avatars

Structured 3D Features for Reconstructing Relightable and Animatable Avatars

  • Homepage: https://enriccorona.github.io/s3f/
  • Paper: https://arxiv.org/abs/2212.06820
  • Code: None
  • Demo: https://www.youtube.com/watch?v=mcZGcQ6L-2s

视觉和语言(Vision-Language)

GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods

  • Paper: https://arxiv.org/abs/2301.01893
  • Code: None

Teaching Structured Vision&Language Concepts to Vision&Language Models

  • Paper: https://arxiv.org/abs/2211.11733
  • Code: None

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

  • Paper: https://arxiv.org/abs/2211.09808
  • Code: https://github.com/fundamentalvision/Uni-Perceiver

Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training

  • Paper: https://arxiv.org/abs/2303.00040
  • Code: None

目标检测(Object Detection)

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

  • Paper: https://arxiv.org/abs/2207.02696
  • Code: https://github.com/WongKinYiu/yolov7

DETRs with Hybrid Matching

  • Paper: https://arxiv.org/abs/2207.13080
  • Code: https://github.com/HDETR

Enhanced Training of Query-Based Object Detection via Selective Query Recollection

  • Paper: https://arxiv.org/abs/2212.07593
  • Code: https://github.com/Fangyi-Chen/SQR

目标跟踪(Visual Tracking)

目标跟踪(Object Tracking)

Simple Cues Lead to a Strong Multi-Object Tracker

  • Paper: https://arxiv.org/abs/2206.04656
  • Code: None

知识蒸馏(Knowledge Distillation)

Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation

  • Paper: https://arxiv.org/abs/2302.14290
  • Code: None

Trajectory Prediction

IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction

  • Paper: https://arxiv.org/abs/2303.00575
  • Code: None

其他(Others)

Interactive Segmentation as Gaussian Process Classification

  • Paper: https://arxiv.org/abs/2302.14578
  • Code: None

Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger

  • Paper: https://arxiv.org/abs/2302.14677
  • Code: None

SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries

  • Homepage: http://bit.ly/splinecam
  • Paper: https://arxiv.org/abs/2302.12828
  • Code: None

SCOTCH and SODA: A Transformer Video Shadow Detection Framework

  • Paper: https://arxiv.org/abs/2211.06885
  • Code: None

DeepMapping2: Self-Supervised Large-Scale LiDAR Map Optimization

  • Homepage: https://ai4ce.github.io/DeepMapping2/
  • Paper: https://arxiv.org/abs/2212.06331
  • None: https://github.com/ai4ce/DeepMapping2

RelightableHands: Efficient Neural Relighting of Articulated Hand Models

  • Homepage: https://sh8.io/#/relightable_hands
  • Paper: https://arxiv.org/abs/2302.04866
  • Code: None

Token Turing Machines

  • Paper: https://arxiv.org/abs/2211.09119
  • Code: None

Single Image Backdoor Inversion via Robust Smoothed Classifiers

  • Paper: https://arxiv.org/abs/2303.00215
  • Code: https://github.com/locuslab/smoothinv

To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision

  • Paper: https://arxiv.org/abs/2106.09614
  • Code: https://github.com/unibas-gravis/Occlusion-Robust-MoFA

HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics

  • Homepage: https://dolorousrtur.github.io/hood/
  • Paper: https://arxiv.org/abs/2212.07242
  • Code: https://github.com/dolorousrtur/hood
  • Demo: https://www.youtube.com/watch?v=cBttMDPrUYY

A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others

  • Paper: https://arxiv.org/abs/2212.04825
  • Code: https://github.com/facebookresearch/Whac-A-Mole.git

RelightableHands: Efficient Neural Relighting of Articulated Hand Models

  • Homepage: https://sh8.io/#/relightable_hands
  • Paper: https://arxiv.org/abs/2302.04866
  • Code: None
  • Demo: https://sh8.io/static/media/teacher_video.923d87957fe0610730c2.mp4