Awesome-Prompt-Adapter-Learning-for-VLMs
Awesome-Prompt-Adapter-Learning-for-VLMs copied to clipboard
A curated list of awesome prompt/adapter learning methods for vision-language models like CLIP.
Awesome-Prompt-Learning-for-VLMs
A curated list of prompt learning methods for vision-language models.
Table of Contents
- Papers
- Surveys
- Prompt Learning
- Test-time Prompt Tuning
- Video Prompting
Keywords
Use text-based learnable prompts.
Use image-based learnable prompts.
Use text- and image-based learnable prompts.
Papers
Surveys
- A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models. [Paper]
- Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey. [Paper]
Prompt Learning
Experimental Comparison
Base-to-Novel Generalization. (ViT-B/16 CLIP)
| Methods | Pub | Base | Novel | HM (main) | Code |
|---|---|---|---|---|---|
| CLIP | ICML 21 | 69.34 | 74.22 | 71.70 | Link |
| CoOp | IJCV 22 | 82.69 | 63.22 | 71.66 | Link |
| CoCoOp | CVPR 22 | 80.47 | 71.69 | 75.83 | Link |
| ProDA | CVPR 22 | 81.56 | 72.30 | 76.65 | Link |
| RPO | ICCV 23 | 81.13 | 75.00 | 77.78 | Link |
| MaPLe | CVPR 23 | 82.28 | 75.14 | 78.55 | Link |
| MetaPrompt | TIP 24 | 83.65 | 75.48 | 79.09 | --- |
| DePT | CVPR 24 | 83.62 | 75.04 | 79.10 | Link |
| LASP | CVPR 23 | 83.18 | 76.11 | 79.48 | --- |
| TCP | CVPR 24 | 84.13 | 75.36 | 79.51 | Link |
| PromptSRC | ICCV 23 | 84.26 | 76.10 | 79.97 | Link |
| HPT | AAAI 24 | 84.32 | 76.86 | 80.23 | Link |
| CoPrompt | ICLR 24 | 84.00 | 77.23 | 80.48 | Link |
| PromptKD | CVPR 24 | 86.96 | 80.73 | 83.73 | Link |
Table 1. Average results on 11 datasets.
Paper List
CoOpLearning to Prompt for Vision-Language Models. IJCV 2022.
[Paper] [Code]CoCoOpConditional Prompt Learning for Vision-Language Models. CVPR 2022.
[Paper] [Code]ProDAPrompt Distribution Learning. CVPR 2022.
[Paper] [Code]VPTVisual Prompt Tuning. ECCV 2022.
[Paper] [Code]MaPLeMaPLe: Multi-modal Prompt Learning. CVPR 2023.
[Paper] [Code]KgCoOpVisual-Language Prompt Tuningx with Knowledge-guided Context Optimization. CVPR 2023.
[Paper] [Code]LASPLASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language Models CVPR 2023.
[Paper]DAM-VPDiversity-Aware Meta Visual Prompting CVPR 2023.
[Paper] [Code]TaskResTask Residual for Tuning Vision-Language Models CVPR 2023.
[Paper] [Code]RPORead-only Prompt Optimization for Vision-Language Few-shot Learning. ICCV 2023.
[Paper] [Code]KAPTKnowledge-Aware Prompt Tuning for Generalizable Vision-Language Models. ICCV 2023.
[Paper]ProGradPrompt-aligned Gradient for Prompt Tuning. ICCV 2023.
[Paper][Code]PromptSRCSelf-regulating Prompts: Foundational Model Adaptation without Forgetting. ICCV 2023.
[Paper] [Code]DeFoLearning to Decompose Visual Features with Latent Textual Prompts. ICLR 2023.
[Paper]POMPPrompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition NeurIPS 2023.
[Paper] [Code]MetaPromptLearning Domain Invariant Prompt for Vision-Language Models. TIP 2024.
[Paper]SA2VPSA2VP: Spatially Aligned-and-Adapted Visual Prompt. AAAI 2024.
[Paper] [Code]LaViPLaViP: Language-Grounded Visual Prompts. AAAI 2024.
[Paper] [Code]HPTLearning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models. AAAI 2024.
[Paper] [Code]LaViPLaViP: Language-Grounded Visual Prompts. AAAI 2024.
[Paper]CoPromptConsistency-guided Prompt Learning for Vision-Language Models. ICLR 2024.
[Paper] [Code]ProTextLearning to Prompt with Text Only Supervision for Vision-Language Models. arxiv 24.
[Paper] [Code]PromptKDUnsupervised Prompt Distillation for Vision Language Models. CVPR 2024.
[Paper] [Code]DePTDePT: Decoupled Prompt Tuning. CVPR 2024.
[Paper] [Code]ArGueArGue: Attribute-Guided Prompt Tuning for Vision-Language Models. CVPR 2024.
[Paper]TCPTCP:Textual-based Class-aware Prompt tuning for Visual-Language Model. CVPR 2024.
[Paper] [Code]
Test-time Prompt Tuning
Experimental Comparison
| Methods | Pub | ImageNet | -A | -V2 | -R | -S | Avg. (main) | Code |
|---|---|---|---|---|---|---|---|---|
| CoOp | IJCV 22 | 71.51 | 49.71 | 64.20 | 75.21 | 47.99 | 59.28 | Link |
| CoCoOp | CVPR 22 | 71.02 | 50.63 | 64.07 | 76.18 | 48.75 | 59.91 | Link |
| TPT | NeurIPS 22 | 68.98 | 54.77 | 63.45 | 77.06 | 47.94 | 60.81 | Link |
| TPT+CoOp | NeurIPS 22 | 73.61 | 57.95 | 66.83 | 77.27 | 49.29 | 62.84 | Link |
| PromptAlign | NeurIPS 23 | --- | 59.37 | 65.29 | 79.33 | 59.37 | 63.55 | Link |
| TPS+CoOp | Arxiv 24 | 73.73 | 60.49 | 66.84 | 77.44 | 49.08 | 65.52 | Link |
| RLCF | ICLR 24 | 73.23 | 65.45 | 69.77 | 83.35 | 54.74 | 68.33 | Link |
| RLCF+CoOp | ICLR 24 | 76.05 | 69.74 | 70.62 | 84.51 | 56.49 | 70.34 | Link |
Table 3. Test-time prompt tuning methods on OOD data.
Paper List
TPTTest-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models. NeurIPS 2022.
[Paper] [Code]SwapPromptSwapPrompt: Test-Time Prompt Adaptation for Vision-Language Models. NeurIPS 2023.
[Paper]PrompAlignAlign Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization. NeurIPS 2023.
[Paper] [Code]TPSJust Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models. Arxiv 2024.
[Paper] [Code]RLCFTest-time Adaptation with CLIP reward for zero-shot generalization in Vision-Language Models. ICLR 2024.
[Paper] [Code]InTTAInvariant Test-Time Adaptation for Vision-Language Model Generalization. Arxiv 2024.
[Paper] [Code]
Video Prompting Learning
Experimental Comparison
Paper List
Efficient-PromptPrompting visual-language models for efficient video understanding. ECCV 2022.
[Paper] [Code]InTTAExpanding Language-Image Pretrained Models for General Video Recognition. ECCV 2022.
[Paper] [Code]ReProCompositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection. ICLR 2023.
[Paper] [Code]