vision-language-transformer topics

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

AlibabaResearch

artificial-intelligence

computer-vision

document

document-analysis

instructrl

50

Stars

5

Forks

Watchers

Instruction Following Agents with Multimodal Transforemrs

forhaoliu

flax

instruction-following

instructions

jax

APE

441

Stars

28

Forks

Watchers

[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception

shenyunhang

image-segmentation

object-detection

open-world

referring-expression-comprehension

UPop

90

Stars

7

Forks

Watchers

[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.

sdc17

efficient-deep-learning

framework

image-captioning

image-text-retrieval

CrossGET

24

Stars

0

Forks

Watchers

[ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.

sdc17

efficient-deep-learning

framework

image-captioning

image-text-retrieval

ReLA

656

Stars

18

Forks

Watchers

[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation

henghuiding

cvpr2023

multimodal-learning

referring-expression-comprehension

referring-expression-segmentation