DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers

Authors: Jaemin Cho, Abhay Zala, and Mohit Bansal (UNC Chapel Hill)
Paper

Visual Reasoning

Please see ./paintskills for our DETR-based visual reasoning skill evaluation.

(Optional) Please see https://github.com/aszala/PaintSkills-Simulator for our 3D Simulator implementation.

Image Quality & Image-Text Alignment

Please see ./quality for our image quaity evaluation based on FID score.

Please see ./retrieval for our image-text alignment evaluation with CLIP-based R-precision.

Please see ./captioning for our image-text alignment evaluation with VL-T5 captioning.

Social Bias

Please see ./biases for our CLIP-based social (gender and racial) bias evaluation.

Models

We provide training and inference scripts for DALLE-small (DALLE-pytorch), ruDALL-E XL, minDALL-E, and X-LXMERT.

Acknowledgments

We thank the developers of DETR, DALLE-pytorch, ruDALL-E, minDALL-E, and X-LXMERT, for their public code release.

Reference

Please cite our paper if you use our dataset in your works:


@article{Cho2022DallEval,
  title         = {DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers},
  author        = {Jaemin Cho and Abhay Zala and Mohit Bansal},
  year          = {2022},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV},
  eprint        = {2202.04053}
}

DallEval
DallEval copied to clipboard

Metadata

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers

Visual Reasoning

Image Quality & Image-Text Alignment

Social Bias

Models

Acknowledgments

Reference

← Metadata

Owner

Metadata

DallEval DallEval copied to clipboard

Metadata

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers

Visual Reasoning

Image Quality & Image-Text Alignment

Social Bias

Models

Acknowledgments

Reference

← Metadata

Owner

Metadata

DallEval
DallEval copied to clipboard