CVPR2024-Papers-with-Code
CVPR2024-Papers-with-Code copied to clipboard
欢迎分享CVPR 2023 论文和代码 / Welcome to share the paper and code of CVPR 2023
[The format of the issue] Paper name/title: Paper link: Code link:
Paper title: DeepMapping2: Self-Supervised Large-Scale LiDAR Map Optimization Paper link: https://arxiv.org/abs/2212.06331 Code link: https://github.com/ai4ce/DeepMapping2
Paper title: VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Completion Paper link: https://arxiv.org/abs/2302.12251 Code link: https://github.com/NVlabs/VoxFormer
Paper title: PolyFormer: Referring Image Segmentation as Sequential Polygon Generation" Paper link: https://arxiv.org/abs/2302.07387
[update May 12th 2023] @amusi Could you please add the code link as well? https://github.com/amazon-science/polygon-transformer
Thank you!
Paper title: All in One: Exploring Unified Video-Language Pre-training Paper link: https://arxiv.org/abs/2203.07303 Code link: https://github.com/showlab/all-in-one
Paper title: Position-guided Text Prompt for Vision Language Pre-training Paper link: https://arxiv.org/abs/2212.09737 Code link: https://github.com/sail-sg/ptp
Paper title: GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis Paper link: https://arxiv.org/abs/2301.12959 Code link: https://github.com/tobran/GALIP
GALIP是一个简单,快速,高质量的文本到图像生成模型,对比需要数百张GPU,400M图文对,数周时间进行预训练的Diffusion Model和Autoregressive Model,GALIP仅需8张3090,12M图文对,3天的预训练时间,取得了相当甚至更好的效果,同时生成速度提高了120倍,支持仅CPU生成。代码和预训练模型已经开源。
GALIP is a simple, fast, and high-quality Text-to-Image Generative Model with comparable results to large pretrained Autoregressive and Diffusion models and 120 times faster synthesis speed. Compared with the Diffusion and Autoregressive models which require hundreds of GPUs, 400M image-text pairs, and several weeks for pre-training, GALIP only needs 8x3090, 12M image-text pairs, and 3 days for pre-training. Furthermore, GALIP can be used without GPU. The code and pre-trained models have been released.
Paper title: HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising Project page: https://aminshabani.github.io/housediffusion/ Paper link: https://arxiv.org/abs/2211.13287 Code link: https://github.com/aminshabani/house_diffusion
Vision Transformers are Parameter-Efficient Audio-Visual Learners Project page: https://yanbo.ml/project_page/LAVISH/ code: https://github.com/GenjiB/LAVISH
3D Visual and Language Paper title: EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding Paper link: https://arxiv.org/abs/2209.14941 Code link: https://github.com/yanmin-wu/EDA
Paper title: DeepMapping2: Self-Supervised Large-Scale LiDAR Map Optimization Paper link: https://arxiv.org/abs/2212.06331 Code link: https://github.com/ai4ce/DeepMapping2
This paper can be categorized into "3D point cloud"
Paper title: Generic-to-Specific Distillation of Masked Autoencoders Paper link: https://arxiv.org/abs/2302.14771 Code link: https://github.com/pengzhiliang/G2SD
This paper can be categorized into "Knowledge Distillation " or "Masked Autoencoders". Thank you!
Paper title: Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples Paper link: https://arxiv.org/abs/2301.01217 Code link: https://github.com/jiamingzhang94/Unlearnable-Clusters
Paper title: MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation Paper link: https://arxiv.org/abs/2212.09478 Code link: https://github.com/researchmm/MM-Diffusion
Paper title: Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation Paper link: https://arxiv.org/abs/2211.13202 Code link: https://github.com/noahzn/Lite-Mono
Paper title: AdaptiveMix: Robust Feature Representation via Shrinking Feature Space Paper link: https://arxiv.org/pdf/2303.01559.pdf Code link: https://github.com/WentianZhang-ML/AdaptiveMix
Paper title: DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting Paper link: https://arxiv.org/pdf/2211.10772v3.pdf Code link: https://github.com/ViTAE-Transformer/DeepSolo
Thank you!
Paper title: DepGraph: Towards Any Structural Pruning Paper link: https://arxiv.org/abs/2301.12900 Code link: https://github.com/VainF/Torch-Pruning
Thank you! This paper should be categorized as Network Pruning
Paper title: Back to the Source: Diffusion-Driven Adaptation to Test-Time Corruption Paper link: https://arxiv.org/abs/2207.03442 Code link: https://github.com/shiyegao/DDA
Thank you!
Paper title: 3D Video Loops from Asynchronous Input Paper link: https://arxiv.org/abs/2303.05312 Project page: https://limacv.github.io/VideoLoop3D_web/ Code link: https://github.com/limacv/VideoLoop3D
This paper should be in a new category named 新视点合成(Novel View Synthesis)
, which I believe is also a hot topic with many more papers. But it can also be categorized as NeRF if no more sections can be added. Thank you!
Paper title: Super-Resolution Neural Operator Paper link: https://arxiv.org/abs/2303.02584 Code link: https://github.com/2y7c3/Super-Resolution-Neural-Operator
Paper name/title: Learning Transferable Spatiotemporal Representations from Natural Script Knowledge Paper link: https://arxiv.org/abs/2209.15280 Code link: https://github.com/TencentARC/TVTS
Paper name/title: DPE: Disentanglement of Pose and Expression for General Video Portrait Editing Paper link: https://arxiv.org/abs/2301.06281 Code link: https://carlyx.github.io/DPE/
paper name/title: SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation Paper link: https://arxiv.org/abs/2211.12194 Code link: https://github.com/Winfredy/SadTalker
Paper name/title: DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network Paper link: https://arxiv.org/abs/2303.02165 Code link: https://github.com/alibaba/lightweight-neural-architecture-search
Please put it in the backbone
chapter of the README.md.
Paper title: DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation Paper link: https://arxiv.org/abs/2303.06285 Code link: https://github.com/Yueming6568/DeltaEdit
Thank you : ) please put it in the GAN/CLIP/image manipulation/ image generation chapters.
The Arxiv link for BiFormer is now available. Please update. Thanks!
Paper name/title: BiFormer: Vision Transformer with Bi-Level Routing Attention Paper link: https://arxiv.org/abs/2303.08810 Code link: https://github.com/rayleizhu/BiFormer
Paper title: TriDet: Temporal Action Detection with Relative Boundary Modeling Paper link: https://arxiv.org/pdf/2303.07347.pdf Code link: https://github.com/dingfengshi/TriDet
maybe it can be put it in Video Understanding or a new chapter Action Detection? Thank you!
Thanks for it. Paper title: Causal-IR: Learning Distortion Invariant Representation for Image Restoration from A Causality Perspective Paper link: https://arxiv.org/pdf/2303.06859.pdf Code link: https://github.com/lixinustc/Casual-IR-DIL
The code will be released soon.
Paper title: Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation Paper link: https://arxiv.org/abs/2303.11203 Code link: https://github.com/l1997i/lim3d
The code will be released soon. Thanks in advance!
Paper name/title: GFPose: Learning 3D Human Pose Prior with Gradient Fields Paper link: https://arxiv.org/pdf/2212.08641.pdf Code link: https://github.com/Embracing/GFPose
Thank you!
Paper name/title: Diversity-Aware Meta Visual Prompting Paper link: https://arxiv.org/abs/2303.08138 Code link: https://github.com/shikiw/DAM-VP
Thanks a lot!