Awesome-Visual-Prompt-Tuning
Awesome-Visual-Prompt-Tuning copied to clipboard
A curated list of awesome papers, resources, and tools for Visual Prompt Tuning (VPT).
Awesome Prompt-Based Adaptation for Vision Models
A curated list of papers, resources and tools on Prompt-Based Adaptation (PA) for large-scale vision models.
Introduction and Motivation
Large vision models, such as Vision Transformers and convolutional backbones, are typically pretrained on massive datasets and then finetuned for downstream tasks. Finetuning all parameters is expensive and may erode pretrained knowledge. Prompt-Based Adaptation (PA) introduces small prompt parameters while freezing the backbone — steering pretrained models efficiently to new tasks.
The survey “Prompt-based Adaptation in Large-scale Vision Models: A Survey” defines PA as a unified framework covering both Visual Prompting (VP) and Visual Prompt Tuning (VPT).
- VP modifies the input image via pixel-space prompts.
- VPT injects learnable tokens inside the network.
Both achieve adaptation with minimal parameter updates and strong generalization.
Table of Contents
- Unified Taxonomy
- Visual Prompting (VP)
- VP-Fixed
- VP-Learnable
- VP-Generated
- Visual Prompt Tuning (VPT)
- VPT-Learnable
- VPT-Generated
- Efficiency Considerations
- Visual Prompting (VP)
- Applications Across Vision Tasks
- Segmentation
- Restoration & Enhancement
- Compression
- Multi-Modal Tasks
- Domain-Specific Applications
- Medical & Biomedical Imaging
- Remote Sensing & Geospatial Analysis
- Robotics & Embodied AI
- Industrial Inspection & Manufacturing
- Autonomous Driving & ADAS
- 3D Point Clouds & LiDAR
- PA under Practical Constraints
- Trustworthy AI
- Related Surveys and Benchmarks
- Contributing
Unified Taxonomy
PA methods are categorized by where prompts are injected (input vs. token space) and how they’re obtained (fixed, learnable, generated).
Visual Prompting (VP)
Prompts are applied directly to pixels before tokenization: [ \tilde{x} = u(x;\theta) ]
- VP-Fixed: no learnable parameters — static boxes, points, or masks (e.g., SAM).
- VP-Learnable: optimize pixel-space overlays, frequency cues, or masks (e.g., Fourier VP, OT-VP).
- VP-Generated: a generator produces adaptive image-level prompts (e.g., BlackVIP).
| Title | Venue | Year | Type | Notes |
|---|---|---|---|---|
| Fourier Visual Prompting | TMLR | 2024 | Learnable | Frequency-domain cues |
| BlackVIP | CVPR | 2023 | Generated | Zeroth-order black-box |
| Custom SAM | 2023 | Learnable | Medical segmentation | |
| Insight Any Instance | 2025 | Learnable | Remote sensing | |
| Visual Prompting via Inpainting | NeurIPS | 2022 | Generated | Early adaptive VP |
Visual Prompt Tuning (VPT)
VPT inserts learnable tokens into frozen model layers: [ Z^{(\ell)} = [x_{cls}; P^{(\ell)}; x_1; …; x_N] ]
- VPT-Learnable: prompt tokens are trained via gradient descent.
- VPT-Generated: small networks produce adaptive prompt tokens.
| Title | Venue | Year | Type | Notes |
|---|---|---|---|---|
| VPT | ECCV | 2022 | Learnable | Foundational method |
| LPT | ICLR | 2023 | Learnable | Long-tailed classes |
| SA2VP | AAAI | 2024 | Learnable | Spatially aligned 2D map |
| E2VPT | ICCV | 2023 | Learnable | Key–value prompts |
| DVPT | NN | 2025 | Generated | Cross-attention generator |
Applications Across Vision Tasks
Segmentation
Prompts help continual, multimodal, and few-shot segmentation (e.g., SAM-adapters, SA2VP).
Restoration & Enhancement
PromptIR and PromptRestorer inject degradation-aware prompts for denoising, dehazing, deraining, etc.
Compression
Prompt tokens control rate–distortion trade-offs in Transformer codecs and guide semantic compression in video.
Multi-Modal Tasks
Visual prompts condition multimodal models (CLIP, MLLMs) to refine image-language alignment and visual reasoning.
Domain-Specific Applications
Medical & Biomedical Imaging
Prompted SAM variants (CusSAM, Ma-SAM) adapt foundation models for 2D/3D medical segmentation and reporting.
VPT bridges visual–textual reasoning for clinical report generation.
Remote Sensing & Geospatial
RSPrompter, ZoRI, and PHTrack apply prompts for segmentation, change detection, and hyperspectral analysis.
Robotics & Embodied AI
Prompts adapt 2D backbones for 3D or motion reasoning (e.g., PointCLIP, ShapeLLM, GAPrompt).
Industrial Inspection
Prompts steer SAM/CLIP for zero-shot defect segmentation and anomaly detection (e.g., ClipSAM, SAID).
Autonomous Driving & ADAS
Severity-aware and differentiable prompts improve perception in adverse conditions, with minimal retraining.
3D Point Clouds & LiDAR
Token-level prompts enhance geometric reasoning and fusion in LiDAR–camera systems (e.g., PointLoRA, PromptDet).
Test-Time and Resource-Constrained Adaptation
Prompts enable on-the-fly adaptation to unseen domains:
- TTA: test-time prompt tuning (e.g., DynaPrompt, C-TPT).
- Black-Box: zeroth-order learning (e.g., BlackVIP).
- Federated / Source-Free: decentralized personalized prompts (e.g., FedPrompt, DDFP).
Trustworthy AI
PA contributes to robustness, fairness, and privacy:
- Robust prompts improve adversarial resistance.
- Fairness prompts mitigate demographic bias.
- Privacy prompts protect sensitive visual data.
- Calibration aligns confidence with accuracy.
Related Surveys and Benchmarks
| Title | Venue | Year | Notes |
|---|---|---|---|
| Prompt Learning in Computer Vision: A Survey | FITEE | 2024 | General overview |
| Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models | arXiv | 2024 | PEFT methods |
| Prompt Engineering on Vision-Language Models | arXiv | 2023 | VL prompts |
| Visual Prompting in MLLMs | arXiv | 2024 | MLLM prompts |
Contributing
We welcome new papers, implementations, and corrections!
Please categorize contributions under:
- VP-Fixed / Learnable / Generated
- VPT-Learnable / Generated
- And note the application domain (e.g., Medical, 3D, Remote Sensing).
Citation
If you find this survey useful in your research, please consider citing our paper:
@article{xiao2025prompt,
title={Prompt-based Adaptation in Large-scale Vision Models: A Survey},
author={Xiao, Xi and Zhang, Yunbei and Zhao, Lin and Liu, Yiyang and Liao, Xiaoying and Mai, Zheda and Li, Xingjian and Wang, Xiao and Xu, Hao and Hamm, Jihun and others},
journal={arXiv preprint arXiv:2510.13219},
year={2025}
}