vision-and-language topic

List vision-and-language repositories

lnfmm

34
Stars
12
Forks
Watchers

Latent Normalizing Flows for Many-to-Many Cross Domain Mappings (ICLR 2020)

SpaCap3D

19
Stars
5
Forks
Watchers

[IJCAI 2022] Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds (official pytorch implementation)

Awesome-Colorful-LLM

104
Stars
6
Forks
Watchers

Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics, and Fundamental Sciences such as Mathematics.

STL-VQA

20
Stars
3
Forks
Watchers

The good practice in the VQA system such as pos-tag attention, structed triplet learning and triplet attention is very general and can be inserted into almost any visual and language task

CPL

31
Stars
4
Forks
Watchers

Official implementation of our EMNLP 2022 paper "CPL: Counterfactual Prompt Learning for Vision and Language Models"

Vote2Cap-DETR

100
Stars
9
Forks
100
Watchers

[T-PAMI 2024] & [CVPR 2023] Vote2Cap-DETR; A set-to-set perspective towards 3D Dense Captioning; State-of-the-Art 3D Dense Captioning methods

Aerial-Vision-and-Dialog-Navigation

31
Stars
6
Forks
Watchers

Codebase of ACL 2023 Findings "Aerial Vision-and-Dialog Navigation"

TGN

17
Stars
4
Forks
Watchers

Tensorflow Reproduction of the EMNLP-2018 paper "Temporally Grounding Natural Sentence in Video"

awesome-vqa-latest

50
Stars
9
Forks
Watchers

Visual Question Answering Paper List.