Awesome Prompt-Based Adaptation for Vision Models

A curated list of papers, resources and tools on Prompt-Based Adaptation (PA) for large-scale vision models.

Introduction and Motivation

Large vision models, such as Vision Transformers and convolutional backbones, are typically pretrained on massive datasets and then finetuned for downstream tasks. Finetuning all parameters is expensive and may erode pretrained knowledge. Prompt-Based Adaptation (PA) introduces small prompt parameters while freezing the backbone — steering pretrained models efficiently to new tasks.
The survey “Prompt-based Adaptation in Large-scale Vision Models: A Survey” defines PA as a unified framework covering both Visual Prompting (VP) and Visual Prompt Tuning (VPT).

VP modifies the input image via pixel-space prompts.
VPT injects learnable tokens inside the network.

Both achieve adaptation with minimal parameter updates and strong generalization.

Unified Taxonomy
- Visual Prompting (VP)
  - VP-Fixed
  - VP-Learnable
  - VP-Generated
- Visual Prompt Tuning (VPT)
  - VPT-Learnable
  - VPT-Generated
- Efficiency Considerations
Applications Across Vision Tasks
- Segmentation
- Restoration & Enhancement
- Compression
- Multi-Modal Tasks
Domain-Specific Applications
- Medical & Biomedical Imaging
- Remote Sensing & Geospatial Analysis
- Robotics & Embodied AI
- Industrial Inspection & Manufacturing
- Autonomous Driving & ADAS
- 3D Point Clouds & LiDAR
PA under Practical Constraints
Trustworthy AI
Related Surveys and Benchmarks
Contributing

Unified Taxonomy

PA methods are categorized by where prompts are injected (input vs. token space) and how they’re obtained (fixed, learnable, generated).

Visual Prompting (VP)

Prompts are applied directly to pixels before tokenization: [ \tilde{x} = u(x;\theta) ]

VP-Fixed: no learnable parameters — static boxes, points, or masks (e.g., SAM).
VP-Learnable: optimize pixel-space overlays, frequency cues, or masks (e.g., Fourier VP, OT-VP).
VP-Generated: a generator produces adaptive image-level prompts (e.g., BlackVIP).

Title	Venue	Year	Type	Notes
Fourier Visual Prompting	TMLR	2024	Learnable	Frequency-domain cues
BlackVIP	CVPR	2023	Generated	Zeroth-order black-box
Custom SAM	2023	Learnable	Medical segmentation
Insight Any Instance	2025	Learnable	Remote sensing
Visual Prompting via Inpainting	NeurIPS	2022	Generated	Early adaptive VP

Visual Prompt Tuning (VPT)

VPT inserts learnable tokens into frozen model layers: [ Z^{(\ell)} = [x_{cls}; P^{(\ell)}; x_1; …; x_N] ]

VPT-Learnable: prompt tokens are trained via gradient descent.
VPT-Generated: small networks produce adaptive prompt tokens.

Title	Venue	Year	Type	Notes
VPT	ECCV	2022	Learnable	Foundational method
LPT	ICLR	2023	Learnable	Long-tailed classes
SA2VP	AAAI	2024	Learnable	Spatially aligned 2D map
E2VPT	ICCV	2023	Learnable	Key–value prompts
DVPT	NN	2025	Generated	Cross-attention generator

Applications Across Vision Tasks

Segmentation

Prompts help continual, multimodal, and few-shot segmentation (e.g., SAM-adapters, SA2VP).

Restoration & Enhancement

PromptIR and PromptRestorer inject degradation-aware prompts for denoising, dehazing, deraining, etc.

Compression

Prompt tokens control rate–distortion trade-offs in Transformer codecs and guide semantic compression in video.

Multi-Modal Tasks

Visual prompts condition multimodal models (CLIP, MLLMs) to refine image-language alignment and visual reasoning.

Domain-Specific Applications

Medical & Biomedical Imaging

Prompted SAM variants (CusSAM, Ma-SAM) adapt foundation models for 2D/3D medical segmentation and reporting.
VPT bridges visual–textual reasoning for clinical report generation.

Remote Sensing & Geospatial

RSPrompter, ZoRI, and PHTrack apply prompts for segmentation, change detection, and hyperspectral analysis.

Robotics & Embodied AI

Prompts adapt 2D backbones for 3D or motion reasoning (e.g., PointCLIP, ShapeLLM, GAPrompt).

Industrial Inspection

Prompts steer SAM/CLIP for zero-shot defect segmentation and anomaly detection (e.g., ClipSAM, SAID).

Autonomous Driving & ADAS

Severity-aware and differentiable prompts improve perception in adverse conditions, with minimal retraining.

3D Point Clouds & LiDAR

Token-level prompts enhance geometric reasoning and fusion in LiDAR–camera systems (e.g., PointLoRA, PromptDet).

Test-Time and Resource-Constrained Adaptation

Prompts enable on-the-fly adaptation to unseen domains:

TTA: test-time prompt tuning (e.g., DynaPrompt, C-TPT).
Black-Box: zeroth-order learning (e.g., BlackVIP).
Federated / Source-Free: decentralized personalized prompts (e.g., FedPrompt, DDFP).

Trustworthy AI

PA contributes to robustness, fairness, and privacy:

Robust prompts improve adversarial resistance.
Fairness prompts mitigate demographic bias.
Privacy prompts protect sensitive visual data.
Calibration aligns confidence with accuracy.

Related Surveys and Benchmarks

Title	Venue	Year	Notes
Prompt Learning in Computer Vision: A Survey	FITEE	2024	General overview
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models	arXiv	2024	PEFT methods
Prompt Engineering on Vision-Language Models	arXiv	2023	VL prompts
Visual Prompting in MLLMs	arXiv	2024	MLLM prompts

Contributing

We welcome new papers, implementations, and corrections!
Please categorize contributions under:

VP-Fixed / Learnable / Generated
VPT-Learnable / Generated
And note the application domain (e.g., Medical, 3D, Remote Sensing).

Citation

If you find this survey useful in your research, please consider citing our paper:

@article{xiao2025prompt,
  title={Prompt-based Adaptation in Large-scale Vision Models: A Survey},
  author={Xiao, Xi and Zhang, Yunbei and Zhao, Lin and Liu, Yiyang and Liao, Xiaoying and Mai, Zheda and Li, Xingjian and Wang, Xiao and Xu, Hao and Hamm, Jihun and others},
  journal={arXiv preprint arXiv:2510.13219},
  year={2025}
}

Awesome-Visual-Prompt-Tuning
Awesome-Visual-Prompt-Tuning copied to clipboard

Metadata

Awesome Prompt-Based Adaptation for Vision Models

Introduction and Motivation

Table of Contents

Unified Taxonomy

Visual Prompting (VP)

Visual Prompt Tuning (VPT)

Applications Across Vision Tasks

Segmentation

Restoration & Enhancement

Compression

Multi-Modal Tasks

Domain-Specific Applications

Medical & Biomedical Imaging

Remote Sensing & Geospatial

Robotics & Embodied AI

Industrial Inspection

Autonomous Driving & ADAS

3D Point Clouds & LiDAR

Test-Time and Resource-Constrained Adaptation

Trustworthy AI

Related Surveys and Benchmarks

Contributing

Citation

← Metadata

Owner

Metadata

Awesome-Visual-Prompt-Tuning Awesome-Visual-Prompt-Tuning copied to clipboard

Metadata

Awesome Prompt-Based Adaptation for Vision Models

Introduction and Motivation

Table of Contents

Unified Taxonomy

Visual Prompting (VP)

Visual Prompt Tuning (VPT)

Applications Across Vision Tasks

Segmentation

Restoration & Enhancement

Compression

Multi-Modal Tasks

Domain-Specific Applications

Medical & Biomedical Imaging

Remote Sensing & Geospatial

Robotics & Embodied AI

Industrial Inspection

Autonomous Driving & ADAS

3D Point Clouds & LiDAR

Test-Time and Resource-Constrained Adaptation

Trustworthy AI

Related Surveys and Benchmarks

Contributing

Citation

← Metadata

Owner

Metadata

Awesome-Visual-Prompt-Tuning
Awesome-Visual-Prompt-Tuning copied to clipboard