Vision and Language Group@ MIL

Results 10 repositories owned by Vision and Language Group@ MIL

mcan-vqa

432
Stars
88
Forks
Watchers

Deep Modular Co-Attention Networks for Visual Question Answering

openvqa

309
Stars
64
Forks
Watchers

A lightweight, scalable, and general framework for visual question answering research

bottom-up-attention.pytorch

289
Stars
74
Forks
Watchers

A PyTorch reimplementation of bottom-up-attention models

activitynet-qa

55
Stars
9
Forks
Watchers

An VideoQA dataset based on the videos from ActivityNet

mmnas

25
Stars
8
Forks
Watchers

Deep Multimodal Neural Architecture Search

mt-captioning

24
Stars
7
Forks
Watchers

A PyTorch implementation of the paper Multimodal Transformer with Multiview Visual Representation for Image Captioning

rosita

55
Stars
13
Forks
Watchers

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

prophet

261
Stars
27
Forks
Watchers

Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".

imp

85
Stars
8
Forks
Watchers

a family of multimodal small language models

xmchat

30
Stars
2
Forks
Watchers