CVPR2024-Papers-with-Code 欢迎分享CVPR 2025 论文和代码 / Welcome to share the paper and code of CVPR 2025

[The format of the issue] Paper title: Paper link: Code link:

Feb 27 '25 06:02 amusi

[Sample] MambaVision: A Hybrid Mamba-Transformer Vision Backbone Paper: https://arxiv.org/abs/2407.08083 Code: https://github.com/NVlabs/MambaVision

Feb 27 '25 06:02 amusi

LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes Paper: https://arxiv.org/abs/2501.04004 Code: https://github.com/Xiangxu-0103/LiMoE Project: https://ldkong.com/LiMoE

Feb 27 '25 06:02 Xiangxu-0103

PAR: Parallelized Autoregressive Visual Generation Paper: https://arxiv.org/abs/2412.15119 Code: https://github.com/Epiphqny/PAR Project: https://epiphqny.github.io/PAR-project/

Feb 27 '25 09:02 YuqingWang1029

StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements Paper: https://arxiv.org/abs/2412.08503 Code: https://github.com/Westlake-AGI-Lab/StyleStudio Project: https://stylestudio-official.github.io/

Feb 27 '25 16:02 MingkunLei

Universal Actions for Enhanced Embodied Foundation Models Paper: https://arxiv.org/abs/2501.10105 Code: https://github.com/2toinf/UniAct Project: https://2toinf.github.io/UniAct/

Feb 28 '25 06:02 2toinf

HVI: A New color space for Low-light Image Enhancement Paper: https://arxiv.org/abs/2502.20272 Code: https://github.com/Fediory/HVI-CIDNet Demo: https://huggingface.co/spaces/Fediory/HVI-CIDNet_Low-light-Image-Enhancement_

Feb 28 '25 08:02 Fediory

Paper title: Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model Paper link: https://arxiv.org/abs/2411.19108 Code link: https://github.com/ali-vilab/TeaCache Project: https://liewfeng.github.io/TeaCache/ Topic: Visual Generation Acceleration

Mar 01 '25 06:03 LiewFeng

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos Paper: https://arxiv.org/abs/2409.02095 Code: https://github.com/Tencent/DepthCrafter Project: https://depthcrafter.github.io Topic: 深度估计(Depth Estimation)

Mar 01 '25 07:03 wbhu

Generative Gaussian Splatting for Unbounded 3D City Generation

Paper: https://arxiv.org/abs/2406.06526 Code: https://github.com/hzxie/GaussianCity Project: https://haozhexie.com/project/gaussian-city Hugging Face: https://huggingface.co/spaces/hzxie/gaussian-city Topic: 3D生成, 3DGS (Gaussian Splatting)

Mar 01 '25 09:03 hzxie

Paper title: DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution Paper link: https://arxiv.org/abs/2405.16071 Code link: https://github.com/callsys/DynRefer Topic: Multimodal learning, MLLM

Mar 01 '25 12:03 callsys

CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos Paper: https://arxiv.org/abs/2411.17820 Project website: https://ai4ce.github.io/CityWalker/ Code: https://github.com/ai4ce/CityWalker Topic: Embodied AI

Mar 03 '25 16:03 Gaaaavin

ReDDiT: Efficient Diffusion as Low Light Enhancer Paper: https://arxiv.org/abs/2410.12346 Code: https://github.com/lgz-0713/ReDDiT Topic: Low-level vision, Image enhancement

Mar 04 '25 08:03 MqLeet

Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis

Paper: https://arxiv.org/abs/2412.02168 Code: https://github.com/pandayuanyu/generative-photography Project Page: https://generative-photography.github.io/project/ Dataset: https://huggingface.co/datasets/pandaphd/camera_settings Demo: https://huggingface.co/spaces/pandaphd/generative_photography

Topic: Image / Video Generation, Camera Physics

Mar 04 '25 21:03 pandayuanyu

StdGEN: Semantic-Decomposed 3D Character Generation from Single Images Paper: https://arxiv.org/abs/2411.05738 Code: https://github.com/hyz317/StdGEN Project Page: https://stdgen.github.io/ Huggingface: https://huggingface.co/hyz317/StdGEN Topic: 3D Generation, Avatar

Mar 06 '25 03:03 hyz317

Retrieval-Augmented Personalization for Multimodal Large Language Models Paper: https://arxiv.org/abs/2410.13360 Code: https://github.com/Hoar012/RAP-MLLM Project Page: https://hoar012.github.io/RAP-Project/ Topic: MLLM

Mar 06 '25 13:03 csuhan

h-Edit: Effective and Flexible Diffusion-Based Editing via Doob’s h-Transform Paper: https://arxiv.org/abs/2503.02187 Code: https://github.com/nktoan/h-edit Topic: Image Editing/ Manipulation/ Generation

Mar 08 '25 16:03 nktoan

NLPrompt: Noise-Label Prompt Learning for Vision-Language Models Paper: https://arxiv.org/abs/2412.01256 Code: https://github.com/qunovo/NLPrompt Topic: Vision-Language Models/ Prompt Learning/ Nosie Label

Mar 10 '25 06:03 qunovo

Paper title:Omnidirectional Multi-Object Tracking Paper link:https://arxiv.org/abs/2503.04565 Code link:https://github.com/xifen523/OmniTrack Topic: Object Tracking

Mar 10 '25 14:03 xifen523

Number it: Temporal Grounding Videos like Flipping Manga Paper: https://arxiv.org/abs/2411.10332 Code: https://github.com/yongliang-wu/NumPro Topic: Vision-Language Models/ Video Understanding/ MLLM

Mar 11 '25 05:03 yongliang-wu

OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation Paper: https://arxiv.org/abs/2411.18499). Code: https://github.com/LanceZPF/OpenING Project Page: https://opening-benchmark.github.io/ Topic: Datasets and Benchmarks

Mar 12 '25 05:03 LanceZPF

BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models Paper: https://arxiv.org/abs/2411.15232 Code: https://github.com/HealthX-Lab/BiomedCoOp Topic: Vision-Language Models/ Medical Imaging / Prompt Learning

Mar 12 '25 06:03 TahaKoleilat

IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification Paper: https://arxiv.org/abs/2503.10324 Code: https://github.com/924973292/IDEA Topic: Multi-modal Fusion / MLLM / Object ReID

Mar 14 '25 05:03 924973292

PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability Paper: https://arxiv.org/abs/2503.08481 Code: https://github.com/unira-zwj/PhysVLM Topic: Embodied AI/ MLLM

Mar 14 '25 06:03 jetteezhou

Unlocking Generalization Power in LiDAR Point Cloud Registration Paper: https://arxiv.org/abs/2503.10149 Code: https://github.com/peakpang/UGP Topic: 3D Registration

Mar 17 '25 06:03 peakpang

FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression Paper: https://arxiv.org/abs/2412.04317 Code: https://github.com/codefanw/FlashSloth Topic: MLLM

Mar 17 '25 11:03 codefanw

MMRL: Multi-Modal Representation Learning for Vision-Language Models Paper: https://arxiv.org/abs/2503.08497 Code: https://github.com/yunncheng/MMRL Topic: CLIP

Mar 17 '25 12:03 yunncheng

Paper title: DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture Paper link: https://arxiv.org/abs/2409.03550 Code link: https://github.com/qianlong0502/DKDM Topic: Diffusion Models, Knowledge Distillation

Mar 17 '25 13:03 qianlong0502

Paper title: DTGBrepGen: A Novel B-rep Generative Model through Decoupling Topology and Geometry Paper link: https://arxiv.org/abs/2503.13110 Code link: https://github.com/jinli99/DTGBrepGen Topic: CAD Generation

Mar 19 '25 05:03 jinli99

Paper title: Building Vision Models upon Heat Conduction Paper link: https://arxiv.org/abs/2405.16555 Code link: https://github.com/MzeroMiko/vHeat Topic: Visual Representation

Mar 20 '25 09:03 feufhd

Paper title: SpiritSight Agent: Advanced GUI Agent with One Look Paper link: https://arxiv.org/abs/2503.03196 Code link: https://hzhiyuan.github.io/SpiritSight-Agent Topic: GUI Agent

Mar 20 '25 13:03 hzhiyuan

CVPR2024-Papers-with-Code CVPR2024-Papers-with-Code copied to clipboard

欢迎分享CVPR 2025 论文和代码 / Welcome to share the paper and code of CVPR 2025

CVPR2024-Papers-with-Code
CVPR2024-Papers-with-Code copied to clipboard