vision-language-transformer topic

List vision-language-transformer repositories

BLIP

4.3k
Stars
573
Forks
Watchers

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Vision-Language-Transformer

335
Stars
21
Forks
Watchers

[ICCV2021 & TPAMI2023] Vision-Language Transformer and Query Generation for Referring Segmentation

LAVIS

9.3k
Stars
921
Forks
Watchers

LAVIS - A One-stop Library for Language-Vision Intelligence

GroundingDINO

5.3k
Stars
557
Forks
Watchers

Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

AdvancedLiterateMachinery

1.4k
Stars
164
Forks
Watchers

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

instructrl

50
Stars
5
Forks
Watchers

Instruction Following Agents with Multimodal Transforemrs

APE

441
Stars
28
Forks
Watchers

[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception

UPop

90
Stars
7
Forks
Watchers

[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.

CrossGET

24
Stars
0
Forks
Watchers

[ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.

ReLA

656
Stars
18
Forks
Watchers

[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation