Awesome-Prompt-Learning-for-Vision-Language-Models
Awesome-Prompt-Learning-for-Vision-Language-Models copied to clipboard
A curated list of prompt learning methods for vision-language models.
Awesome-Prompt-Learning-for-VLMs
A curated list of prompt learning methods for vision-language models.
Table of Contents
-
Papers
- Surveys
- Prompt Learning
- Test-time Prompt Tuning
- Video Prompting
Keywords
Use text-based learnable prompts.
Use image-based learnable prompts.
Use text- and image-based learnable prompts.
Papers
Surveys
- A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models. [Paper]
- Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey. [Paper]
Prompt Learning
Experimental Comparison
Base-to-Novel Generalization. (ViT-B/16 CLIP)
Methods | Pub | Base | Novel | HM (main) | Code |
---|---|---|---|---|---|
CLIP | ICML 21 | 69.34 | 74.22 | 71.70 | Link |
CoOp | IJCV 22 | 82.69 | 63.22 | 71.66 | Link |
CoCoOp | CVPR 22 | 80.47 | 71.69 | 75.83 | Link |
ProDA | CVPR 22 | 81.56 | 72.30 | 76.65 | Link |
RPO | ICCV 23 | 81.13 | 75.00 | 77.78 | Link |
MaPLe | CVPR 23 | 82.28 | 75.14 | 78.55 | Link |
MetaPrompt | TIP 24 | 83.65 | 75.48 | 79.09 | --- |
DePT | CVPR 24 | 83.62 | 75.04 | 79.10 | Link |
LASP | CVPR 23 | 83.18 | 76.11 | 79.48 | --- |
TCP | CVPR 24 | 84.13 | 75.36 | 79.51 | Link |
PromptSRC | ICCV 23 | 84.26 | 76.10 | 79.97 | Link |
HPT | AAAI 24 | 84.32 | 76.86 | 80.23 | Link |
CoPrompt | ICLR 24 | 84.00 | 77.23 | 80.48 | Link |
PromptKD | CVPR 24 | 86.96 | 80.73 | 83.73 | Link |
Table 1. Average results on 11 datasets.
Paper List
-
CoOp
Learning to Prompt for Vision-Language Models. IJCV 2022.
[Paper] [Code] -
CoCoOp
Conditional Prompt Learning for Vision-Language Models. CVPR 2022.
[Paper] [Code] -
ProDA
Prompt Distribution Learning. CVPR 2022.
[Paper] [Code] -
VPT
Visual Prompt Tuning. ECCV 2022.
[Paper] [Code] -
MaPLe
MaPLe: Multi-modal Prompt Learning. CVPR 2023.
[Paper] [Code] -
KgCoOp
Visual-Language Prompt Tuningx with Knowledge-guided Context Optimization. CVPR 2023.
[Paper] [Code] -
LASP
LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language Models CVPR 2023.
[Paper] -
DAM-VP
Diversity-Aware Meta Visual Prompting CVPR 2023.
[Paper] [Code] -
TaskRes
Task Residual for Tuning Vision-Language Models CVPR 2023.
[Paper] [Code] -
RPO
Read-only Prompt Optimization for Vision-Language Few-shot Learning. ICCV 2023.
[Paper] [Code] -
KAPT
Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models. ICCV 2023.
[Paper] -
ProGrad
Prompt-aligned Gradient for Prompt Tuning. ICCV 2023.
[Paper][Code] -
PromptSRC
Self-regulating Prompts: Foundational Model Adaptation without Forgetting. ICCV 2023.
[Paper] [Code] -
DeFo
Learning to Decompose Visual Features with Latent Textual Prompts. ICLR 2023.
[Paper] -
POMP
Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition NeurIPS 2023.
[Paper] [Code] -
MetaPrompt
Learning Domain Invariant Prompt for Vision-Language Models. TIP 2024.
[Paper] -
SA2VP
SA2VP: Spatially Aligned-and-Adapted Visual Prompt. AAAI 2024.
[Paper] [Code] -
LaViP
LaViP: Language-Grounded Visual Prompts. AAAI 2024.
[Paper] [Code] -
HPT
Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models. AAAI 2024.
[Paper] [Code] -
LaViP
LaViP: Language-Grounded Visual Prompts. AAAI 2024.
[Paper] -
CoPrompt
Consistency-guided Prompt Learning for Vision-Language Models. ICLR 2024.
[Paper] [Code] -
ProText
Learning to Prompt with Text Only Supervision for Vision-Language Models. arxiv 24.
[Paper] [Code] -
PromptKD
Unsupervised Prompt Distillation for Vision Language Models. CVPR 2024.
[Paper] [Code] -
DePT
DePT: Decoupled Prompt Tuning. CVPR 2024.
[Paper] [Code] -
ArGue
ArGue: Attribute-Guided Prompt Tuning for Vision-Language Models. CVPR 2024.
[Paper] -
TCP
TCP:Textual-based Class-aware Prompt tuning for Visual-Language Model. CVPR 2024.
[Paper] [Code]
Test-time Prompt Tuning
Experimental Comparison
Methods | Pub | ImageNet | -A | -V2 | -R | -S | Avg. (main) | Code |
---|---|---|---|---|---|---|---|---|
CoOp | IJCV 22 | 71.51 | 49.71 | 64.20 | 75.21 | 47.99 | 59.28 | Link |
CoCoOp | CVPR 22 | 71.02 | 50.63 | 64.07 | 76.18 | 48.75 | 59.91 | Link |
TPT | NeurIPS 22 | 68.98 | 54.77 | 63.45 | 77.06 | 47.94 | 60.81 | Link |
TPT+CoOp | NeurIPS 22 | 73.61 | 57.95 | 66.83 | 77.27 | 49.29 | 62.84 | Link |
PromptAlign | NeurIPS 23 | --- | 59.37 | 65.29 | 79.33 | 59.37 | 63.55 | Link |
TPS+CoOp | Arxiv 24 | 73.73 | 60.49 | 66.84 | 77.44 | 49.08 | 65.52 | Link |
RLCF | ICLR 24 | 73.23 | 65.45 | 69.77 | 83.35 | 54.74 | 68.33 | Link |
RLCF+CoOp | ICLR 24 | 76.05 | 69.74 | 70.62 | 84.51 | 56.49 | 70.34 | Link |
Table 3. Test-time prompt tuning methods on OOD data.
Paper List
-
TPT
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models. NeurIPS 2022.
[Paper] [Code] -
SwapPrompt
SwapPrompt: Test-Time Prompt Adaptation for Vision-Language Models. NeurIPS 2023.
[Paper] -
PrompAlign
Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization. NeurIPS 2023.
[Paper] [Code] -
TPS
Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models. Arxiv 2024.
[Paper] [Code] -
RLCF
Test-time Adaptation with CLIP reward for zero-shot generalization in Vision-Language Models. ICLR 2024.
[Paper] [Code] -
InTTA
Invariant Test-Time Adaptation for Vision-Language Model Generalization. Arxiv 2024.
[Paper] [Code]
Video Prompting Learning
Experimental Comparison
Paper List
-
Efficient-Prompt
Prompting visual-language models for efficient video understanding. ECCV 2022.
[Paper] [Code] -
InTTA
Expanding Language-Image Pretrained Models for General Video Recognition. ECCV 2022.
[Paper] [Code] -
RePro
Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection. ICLR 2023.
[Paper] [Code]