Vision and Language Group@ MIL

Results 10 repositories owned by


Vision and Language Group@ MIL

mcan-vqa

432

Stars

Forks

Watchers

Deep Modular Co-Attention Networks for Visual Question Answering

MILVLG

attention

visual-question-answering

visual-reasoning

openvqa

309

Stars

Forks

Watchers

A lightweight, scalable, and general framework for visual question answering research

visual-question-answering

bottom-up-attention.pytorch

289

Stars

Forks

Watchers

A PyTorch reimplementation of bottom-up-attention models

activitynet-qa

Stars

Forks

Watchers

An VideoQA dataset based on the videos from ActivityNet

vqa

mmnas

Stars

Forks

Watchers

Deep Multimodal Neural Architecture Search

MILVLG

mt-captioning

Stars

Forks

Watchers

A PyTorch implementation of the paper Multimodal Transformer with Multiview Visual Representation for Image Captioning

MILVLG

image-captioning

multimodal-transformer

pytorch

rosita

Stars

Forks

Watchers

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

MILVLG

image-text-retrieval

pre-training

referring-expression-comprehension

vision-and-language

prophet

261

Stars

Forks

Watchers

Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".

MILVLG

a-okvqa

gpt-3

multimodal-deep-learning

okvqa

imp

Stars

Forks

Watchers

a family of multimodal small language models

MILVLG

xmchat

Stars

Forks

Watchers

MILVLG