Uniaa
Uniaa copied to clipboard
Unified Multi-modal IAA Baseline and Benchmark
Uniaa: A Unified Multi-modal Image Aesthetic Assessment Baseline and Benchmark
The Unified Multi-modal Image Aesthetic Assessment Framework, containing a baseline (a) and a benchmark (b). The aesthetic perception performance of UNIAA-LLaVA and other MLLMs is shown in (c).
The IAA Datasets Conversion Paradigm for UNIAA-LLaVA.
The UNIAA-Bench overview. (a) UNIAA-QA contains 5354 Image-Question-Answer samples and (b) UNIAA-Describe contains 501 Image-Description samples. (c) For open-source MLLMs, Logits can be extracted to calculate the score.
Release
- [4/15] 🔥 We build the page of UNIAA!
Performance
Aesthetic Perception Performance
![](https://github.com/KwaiVGI/Uniaa/raw/main/imgs/perception.png)
Aesthetic Description Performance
![](https://github.com/KwaiVGI/Uniaa/raw/main/imgs/description.png)
Aesthetic Assessment Performance
Zero-shot
![](https://github.com/KwaiVGI/Uniaa/raw/main/imgs/zero-shot-assessment.png)
Supervised learning on AVA and TAD66K
![](https://github.com/KwaiVGI/Uniaa/raw/main/imgs/superivised-learning-assessment.png)
Training on data of UNIAA
Step 1: Download Images and Json files
Step 2: Training On Specific MLLM
Test on UNIAA-Bench
For Aesthetic Perception
Step 1: Download Images and Json files
Step 2: Run the inference code
Step 3: Calculate the score
For Aesthetic Description
Step 1: Download Images and Json files
Step 2: Run the inference code
Citation
If you find UNIAA useful for your your research and applications, please cite using this BibTeX:
@misc{zhou2024uniaa,
title={UNIAA: A Unified Multi-modal Image Aesthetic Assessment Baseline and Benchmark},
author={Zhaokun Zhou and Qiulin Wang and Bin Lin and Yiwei Su and Rui Chen and Xin Tao and Amin Zheng and Li Yuan and Pengfei Wan and Di Zhang},
year={2024},
eprint={2404.09619},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Contact
If you have any questions, please feel free to email [email protected] and [email protected].