vlms topics

HallusionBench

228

Stars

5

Forks

Watchers

[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

tianyi-lab

benchmark

benchmarks

gpt-4

gpt-4v

openai-scala-client

240

Stars

36

Forks

240

Watchers

Scala client for OpenAI API and other major LLM providers

cequence-io

chatgpt

dall-e

gpt-3

gpt-4

ViTamin

210

Stars

6

Forks

210

Watchers

[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"

Beckschen

scalable-vision-encoder

vlms

CAL

59

Stars

2

Forks

59

Watchers

[NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment

foundation-multimodal-models

contrastive-alignment

vlms

AWT

70

Stars

1

Forks

Watchers

[NeurIPS 2024] AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation

MCG-NJU

clip

computer-vision

open-set-recognition

siglip

docext

1.8k

Stars

137

Forks

1.8k

Watchers

An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)

NanoNets

document

document-analysis

document-data-extraction

document-information-extraction

OmniCaptioner

168

Stars

14

Forks

168

Watchers

Official Repository of OmniCaptioner

InternScience

caption-generation

captioning-images

deepseek-r1

multi-modal