CVPR2024-Papers-with-Code
CVPR2024-Papers-with-Code copied to clipboard
欢迎分享CVPR 2025 论文和代码 / Welcome to share the paper and code of CVPR 2025
[The format of the issue] Paper title: Paper link: Code link:
[Sample] MambaVision: A Hybrid Mamba-Transformer Vision Backbone Paper: https://arxiv.org/abs/2407.08083 Code: https://github.com/NVlabs/MambaVision
LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes Paper: https://arxiv.org/abs/2501.04004 Code: https://github.com/Xiangxu-0103/LiMoE Project: https://ldkong.com/LiMoE
PAR: Parallelized Autoregressive Visual Generation Paper: https://arxiv.org/abs/2412.15119 Code: https://github.com/Epiphqny/PAR Project: https://epiphqny.github.io/PAR-project/
StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements Paper: https://arxiv.org/abs/2412.08503 Code: https://github.com/Westlake-AGI-Lab/StyleStudio Project: https://stylestudio-official.github.io/
Universal Actions for Enhanced Embodied Foundation Models Paper: https://arxiv.org/abs/2501.10105 Code: https://github.com/2toinf/UniAct Project: https://2toinf.github.io/UniAct/
HVI: A New color space for Low-light Image Enhancement Paper: https://arxiv.org/abs/2502.20272 Code: https://github.com/Fediory/HVI-CIDNet Demo: https://huggingface.co/spaces/Fediory/HVI-CIDNet_Low-light-Image-Enhancement_
Paper title: Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model Paper link: https://arxiv.org/abs/2411.19108 Code link: https://github.com/ali-vilab/TeaCache Project: https://liewfeng.github.io/TeaCache/ Topic: Visual Generation Acceleration
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos Paper: https://arxiv.org/abs/2409.02095 Code: https://github.com/Tencent/DepthCrafter Project: https://depthcrafter.github.io Topic: 深度估计(Depth Estimation)
Generative Gaussian Splatting for Unbounded 3D City Generation
Paper: https://arxiv.org/abs/2406.06526 Code: https://github.com/hzxie/GaussianCity Project: https://haozhexie.com/project/gaussian-city Hugging Face: https://huggingface.co/spaces/hzxie/gaussian-city Topic: 3D生成, 3DGS (Gaussian Splatting)
Paper title: DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution Paper link: https://arxiv.org/abs/2405.16071 Code link: https://github.com/callsys/DynRefer Topic: Multimodal learning, MLLM
CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos Paper: https://arxiv.org/abs/2411.17820 Project website: https://ai4ce.github.io/CityWalker/ Code: https://github.com/ai4ce/CityWalker Topic: Embodied AI
ReDDiT: Efficient Diffusion as Low Light Enhancer Paper: https://arxiv.org/abs/2410.12346 Code: https://github.com/lgz-0713/ReDDiT Topic: Low-level vision, Image enhancement
Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis
Paper: https://arxiv.org/abs/2412.02168 Code: https://github.com/pandayuanyu/generative-photography Project Page: https://generative-photography.github.io/project/ Dataset: https://huggingface.co/datasets/pandaphd/camera_settings Demo: https://huggingface.co/spaces/pandaphd/generative_photography
Topic: Image / Video Generation, Camera Physics
StdGEN: Semantic-Decomposed 3D Character Generation from Single Images Paper: https://arxiv.org/abs/2411.05738 Code: https://github.com/hyz317/StdGEN Project Page: https://stdgen.github.io/ Huggingface: https://huggingface.co/hyz317/StdGEN Topic: 3D Generation, Avatar
Retrieval-Augmented Personalization for Multimodal Large Language Models Paper: https://arxiv.org/abs/2410.13360 Code: https://github.com/Hoar012/RAP-MLLM Project Page: https://hoar012.github.io/RAP-Project/ Topic: MLLM
h-Edit: Effective and Flexible Diffusion-Based Editing via Doob’s h-Transform Paper: https://arxiv.org/abs/2503.02187 Code: https://github.com/nktoan/h-edit Topic: Image Editing/ Manipulation/ Generation
NLPrompt: Noise-Label Prompt Learning for Vision-Language Models Paper: https://arxiv.org/abs/2412.01256 Code: https://github.com/qunovo/NLPrompt Topic: Vision-Language Models/ Prompt Learning/ Nosie Label
Paper title:Omnidirectional Multi-Object Tracking Paper link:https://arxiv.org/abs/2503.04565 Code link:https://github.com/xifen523/OmniTrack Topic: Object Tracking
Number it: Temporal Grounding Videos like Flipping Manga Paper: https://arxiv.org/abs/2411.10332 Code: https://github.com/yongliang-wu/NumPro Topic: Vision-Language Models/ Video Understanding/ MLLM
OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation Paper: https://arxiv.org/abs/2411.18499). Code: https://github.com/LanceZPF/OpenING Project Page: https://opening-benchmark.github.io/ Topic: Datasets and Benchmarks
BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models Paper: https://arxiv.org/abs/2411.15232 Code: https://github.com/HealthX-Lab/BiomedCoOp Topic: Vision-Language Models/ Medical Imaging / Prompt Learning
IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification Paper: https://arxiv.org/abs/2503.10324 Code: https://github.com/924973292/IDEA Topic: Multi-modal Fusion / MLLM / Object ReID
PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability Paper: https://arxiv.org/abs/2503.08481 Code: https://github.com/unira-zwj/PhysVLM Topic: Embodied AI/ MLLM
Unlocking Generalization Power in LiDAR Point Cloud Registration Paper: https://arxiv.org/abs/2503.10149 Code: https://github.com/peakpang/UGP Topic: 3D Registration
FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression Paper: https://arxiv.org/abs/2412.04317 Code: https://github.com/codefanw/FlashSloth Topic: MLLM
MMRL: Multi-Modal Representation Learning for Vision-Language Models Paper: https://arxiv.org/abs/2503.08497 Code: https://github.com/yunncheng/MMRL Topic: CLIP
Paper title: DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture Paper link: https://arxiv.org/abs/2409.03550 Code link: https://github.com/qianlong0502/DKDM Topic: Diffusion Models, Knowledge Distillation
Paper title: DTGBrepGen: A Novel B-rep Generative Model through Decoupling Topology and Geometry Paper link: https://arxiv.org/abs/2503.13110 Code link: https://github.com/jinli99/DTGBrepGen Topic: CAD Generation
Paper title: Building Vision Models upon Heat Conduction Paper link: https://arxiv.org/abs/2405.16555 Code link: https://github.com/MzeroMiko/vHeat Topic: Visual Representation
Paper title: SpiritSight Agent: Advanced GUI Agent with One Look Paper link: https://arxiv.org/abs/2503.03196 Code link: https://hzhiyuan.github.io/SpiritSight-Agent Topic: GUI Agent