DeQA-Score icon indicating copy to clipboard operation
DeQA-Score copied to clipboard

[CVPR 2025] Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution

Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution

1Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, 2Multimedia Laboratory, The Chinese University of Hong Kong, 3Shanghai AI Laboratory, 4Shenzhen University of Advanced Technology, 5CPII under InnoHK
#Corresponding author.
Homepage | Model Weights ( Full Tuning / LoRA Tuning ) | Datasets | Paper

Motivation

Model Architecture

[Installation Free!] Quicker Start with Hugging Face AutoModel

[2025.12] Thanks to @lyf1212's suggestion, we add support on transformers==4.46.3 with minor code modifications. See details.

The following code could be run directly with transformers==4.36.1. No need to install this GitHub repo.

import requests
import torch
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
  "zhiyuanyou/DeQA-Score-Mix3",
  trust_remote_code=True,
  attn_implementation="eager",
  torch_dtype=torch.float16,
  device_map="auto",
)

from PIL import Image

# The inputs should be a list of multiple PIL images
model.score(
  [Image.open(requests.get(
    "https://raw.githubusercontent.com/zhiyuanyou/DeQA-Score/main/fig/singapore_flyer.jpg", stream=True
    ).raw)]
)

Installation

If you only need to infer / evaluate:

git clone https://github.com/zhiyuanyou/DeQA-Score.git
cd DeQA-Score
pip install -e .

For training, you need to further install additional dependencies as follows:

pip install -e ".[train]"
pip install flash_attn --no-build-isolation

Quick Start

Image Quality Scorer

  • CLI Interface
python src/evaluate/scorer.py --img_path fig/singapore_flyer.jpg
  • Python API
from src import Scorer
from PIL import Image

scorer = Scorer()
img_list = [Image.open("fig/singapore_flyer.jpg")] # can be a list of multiple PIL images
print(scorer(img_list).tolist())

Training, Inference & Evaluation

Datasets

|-- DeQA-Score
|-- Data-DeQA-Score
  |-- KONIQ
    |-- images/*.jpg
    |-- metas
  |-- SPAQ
    |-- images/*.jpg
    |-- metas
  |-- KADID10K
    |-- images/*.png
    |-- metas
  |-- PIPAL
    |-- images/Distortion_*/*.bmp
    |-- metas
  |-- LIVE-WILD
    |-- images/*.bmp
    |-- metas
  |-- AGIQA3K
    |-- images/*.jpg
    |-- metas
  |-- TID2013
    |-- images/distorted_images/*.bmp
    |-- metas
  |-- CSIQ
    |-- images/dst_imgs/*/*.png
    |-- metas

Pretrained Weights

We provide two model weights (full tuning and LoRA tuning) with similar performance.

Training Datasets Weights
Full Tuning KonIQ, SPAQ, KADID Huggingface Full
LoRA Tuning KonIQ, SPAQ, KADID Huggingface LoRA

Download one of the above model weights, then arrange the folders as follows:

|-- DeQA-Score
  |-- checkpoints
    |-- DeQA-Score-Mix3
    |-- DeQA-Score-LoRA-Mix3

If you would like to use the LoRA tuning weights, you need to download the base mPLUG-Owl2 weights from Huggingface mPLUG-Owl2, then arrange the folders as follows:

|-- DeQA-Score
|-- ModelZoo
  |-- mplug-owl2-llama2-7b

Inference

After preparing the datasets, you can infer using pre-trained DeQA-Score or DeQA-Score-LoRA:

sh scripts/infer.sh $ONE_GPU_ID
sh scripts/infer_lora.sh $ONE_GPU_ID

Evaluation

After inference, you can evaluate the inference results:

  • SRCC / PLCC for quality score.
sh scripts/eval_score.sh
  • KL Divergence / JS Divergence / Wasserstein Distance for score distribution.
sh scripts/eval_dist.sh

Fine-tuning

Fine-tuning needs to download the mPLUG-Owl2 weights as in Pretrained Weights.

LoRA Fine-tuning

  • Only 2 RTX3090 GPUs are required. Revise --data_paths in the training shell to load different datasets. Default training datasets are KonIQ, SPAQ, and KADID.
sh scripts/train_lora.sh $GPU_IDs

Full Fine-tuning from the Scratch

  • At least 8 A6000 GPUs or 4 A100 GPUs will be enough. Revise --data_paths in the training shell to load different datasets. Default training datasets are KonIQ, SPAQ, and KADID.
sh scripts/train.sh $GPU_IDs

Soft Label Construction

  • Download split.json (training & test split info) and mos.json (mos & std info) of KonIQ, SPAQ, and KADID from Huggingface Metas, and arrange the folders as in Datasets.

  • Run the following scripts to construct the distribution-based soft labels.

cd build_soft_labels
python gen_soft_label.py

Acknowledgements

This work is based on Q-Align. Sincerely thanks for this awesome work.

Citation

If you find our work useful for your research and applications, please cite using the BibTeX:

@inproceedings{deqa_score,
    title={Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution},
    author={You, Zhiyuan and Cai, Xin and Gu, Jinjin and Xue, Tianfan and Dong, Chao},
    booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    pages={14483--14494},
    year={2025}
}

@article{depictqa_v2,
    title={Enhancing Descriptive Image Quality Assessment with A Large-scale Multi-modal Dataset},
    author={You, Zhiyuan and Gu, Jinjin and Cai, Xin and Li, Zheyuan and Zhu, Kaiwen and Dong, Chao and Xue, Tianfan},
    journal={IEEE Transactions on Image Processing},
    year={2025}
}

@inproceedings{depictqa_v1,
    title={Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models},
    author={You, Zhiyuan and Li, Zheyuan and Gu, Jinjin and Yin, Zhenfei and Xue, Tianfan and Dong, Chao},
    booktitle={European Conference on Computer Vision},
    pages={259--276},
    year={2024}
}