OneIG-Bench might be a better image generation benchmark?

Open wchengad opened this issue 10 months ago • 0 comments

DPG-Bench introduced dense prompt evaluation for text-to-image (T2I) model benchmarking, becoming one of the most widely used benchmarks in this field. However, as better image generation models continuously emerge and improve, model performance evaluation needs to extend beyond just dense prompts. Aspects such as stylization, text rendering, reasoning, multilingual support, and more now require detailed evaluation.

To address this, in the newly proposed OneIG-Bench (https://arxiv.org/abs/2506.07977), the authors conduct an Omni-dimensional Nuanced Evaluation for the Image Generation task.

Key Features of OneIG-Bench:

Comprehensive Prompt Sets:
- Six specialized categories:
  - 245 Anime & Stylization prompts (EN/ZH)
  - 244 Portrait prompts (EN/ZH)
  - 206 General Object prompts (EN/ZH)
  - 200 Text Rendering prompts (EN/ZH)
  - 225 Knowledge & Reasoning prompts (EN/ZH)
  - 200 Multilingualism prompts
- Bilingual coverage: First five sets available in both English and Chinese
- Designed for holistic evaluation of modern text-to-image models
Systematic Quantitative Framework:
- Enables objective capability ranking via standardized metrics
- Ensures direct cross-model comparability
- Dimension-specific evaluation protocol:
  - Models generate images only for prompts within one evaluation dimension
  - Performance assessed exclusively within that targeted dimension

Here are the evaluation visualization of the most representative SOTA T2I models⬇️

Jun 10 '25 12:06 wchengad