Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution

Zhiyuan You¹², Xin Cai², Jinjin Gu⁴, Tianfan Xue²³⁵^#, Chao Dong¹³⁴^#

¹Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, ²Multimedia Laboratory, The Chinese University of Hong Kong, ³Shanghai AI Laboratory, ⁴Shenzhen University of Advanced Technology, ⁵CPII under InnoHK

^#Corresponding author.

Homepage | Model Weights ( Full Tuning / LoRA Tuning ) | Datasets | Paper

Motivation

Model Architecture

[Installation Free!] Quicker Start with Hugging Face AutoModel

[2025.12] Thanks to @lyf1212's suggestion, we add support on transformers==4.46.3 with minor code modifications. See details.

The following code could be run directly with transformers==4.36.1. No need to install this GitHub repo.

import requests
import torch
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
  "zhiyuanyou/DeQA-Score-Mix3",
  trust_remote_code=True,
  attn_implementation="eager",
  torch_dtype=torch.float16,
  device_map="auto",
)

from PIL import Image

# The inputs should be a list of multiple PIL images
model.score(
  [Image.open(requests.get(
    "https://raw.githubusercontent.com/zhiyuanyou/DeQA-Score/main/fig/singapore_flyer.jpg", stream=True
    ).raw)]
)

Installation

If you only need to infer / evaluate:

git clone https://github.com/zhiyuanyou/DeQA-Score.git
cd DeQA-Score
pip install -e .

For training, you need to further install additional dependencies as follows:

pip install -e ".[train]"
pip install flash_attn --no-build-isolation

Quick Start

Image Quality Scorer

CLI Interface

python src/evaluate/scorer.py --img_path fig/singapore_flyer.jpg

Python API

from src import Scorer
from PIL import Image

scorer = Scorer()
img_list = [Image.open("fig/singapore_flyer.jpg")] # can be a list of multiple PIL images
print(scorer(img_list).tolist())

Training, Inference & Evaluation

Datasets

Download our meta files from Huggingface Metas.
Download source images from KonIQ, SPAQ, KADID, PIPAL, LIVE-Wild, AGIQA, TID2013, and CSIQ.
Arrange the folders as follows:

|-- DeQA-Score
|-- Data-DeQA-Score
  |-- KONIQ
    |-- images/*.jpg
    |-- metas
  |-- SPAQ
    |-- images/*.jpg
    |-- metas
  |-- KADID10K
    |-- images/*.png
    |-- metas
  |-- PIPAL
    |-- images/Distortion_*/*.bmp
    |-- metas
  |-- LIVE-WILD
    |-- images/*.bmp
    |-- metas
  |-- AGIQA3K
    |-- images/*.jpg
    |-- metas
  |-- TID2013
    |-- images/distorted_images/*.bmp
    |-- metas
  |-- CSIQ
    |-- images/dst_imgs/*/*.png
    |-- metas

Pretrained Weights

We provide two model weights (full tuning and LoRA tuning) with similar performance.

	Training Datasets	Weights
Full Tuning	KonIQ, SPAQ, KADID	Huggingface Full
LoRA Tuning	KonIQ, SPAQ, KADID	Huggingface LoRA

Download one of the above model weights, then arrange the folders as follows:

|-- DeQA-Score
  |-- checkpoints
    |-- DeQA-Score-Mix3
    |-- DeQA-Score-LoRA-Mix3

If you would like to use the LoRA tuning weights, you need to download the base mPLUG-Owl2 weights from Huggingface mPLUG-Owl2, then arrange the folders as follows:

|-- DeQA-Score
|-- ModelZoo
  |-- mplug-owl2-llama2-7b

Inference

After preparing the datasets, you can infer using pre-trained DeQA-Score or DeQA-Score-LoRA:

sh scripts/infer.sh $ONE_GPU_ID

sh scripts/infer_lora.sh $ONE_GPU_ID

Evaluation

After inference, you can evaluate the inference results:

SRCC / PLCC for quality score.

sh scripts/eval_score.sh

KL Divergence / JS Divergence / Wasserstein Distance for score distribution.

sh scripts/eval_dist.sh

Fine-tuning

Fine-tuning needs to download the mPLUG-Owl2 weights as in Pretrained Weights.

LoRA Fine-tuning

Only 2 RTX3090 GPUs are required. Revise --data_paths in the training shell to load different datasets. Default training datasets are KonIQ, SPAQ, and KADID.

sh scripts/train_lora.sh $GPU_IDs

Full Fine-tuning from the Scratch

At least 8 A6000 GPUs or 4 A100 GPUs will be enough. Revise --data_paths in the training shell to load different datasets. Default training datasets are KonIQ, SPAQ, and KADID.

sh scripts/train.sh $GPU_IDs

Soft Label Construction

Download split.json (training & test split info) and mos.json (mos & std info) of KonIQ, SPAQ, and KADID from Huggingface Metas, and arrange the folders as in Datasets.
Run the following scripts to construct the distribution-based soft labels.

cd build_soft_labels
python gen_soft_label.py

Acknowledgements

This work is based on Q-Align. Sincerely thanks for this awesome work.

Citation

If you find our work useful for your research and applications, please cite using the BibTeX:

@inproceedings{deqa_score,
    title={Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution},
    author={You, Zhiyuan and Cai, Xin and Gu, Jinjin and Xue, Tianfan and Dong, Chao},
    booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    pages={14483--14494},
    year={2025}
}

@article{depictqa_v2,
    title={Enhancing Descriptive Image Quality Assessment with A Large-scale Multi-modal Dataset},
    author={You, Zhiyuan and Gu, Jinjin and Cai, Xin and Li, Zheyuan and Zhu, Kaiwen and Dong, Chao and Xue, Tianfan},
    journal={IEEE Transactions on Image Processing},
    year={2025}
}

@inproceedings{depictqa_v1,
    title={Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models},
    author={You, Zhiyuan and Li, Zheyuan and Gu, Jinjin and Yin, Zhenfei and Xue, Tianfan and Dong, Chao},
    booktitle={European Conference on Computer Vision},
    pages={259--276},
    year={2024}
}

DeQA-Score
DeQA-Score copied to clipboard

Metadata

Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution

Motivation

Model Architecture

[Installation Free!] Quicker Start with Hugging Face AutoModel

Installation

Quick Start

Image Quality Scorer

Training, Inference & Evaluation

Datasets

Pretrained Weights

Inference

Evaluation

Fine-tuning

LoRA Fine-tuning

Full Fine-tuning from the Scratch

Soft Label Construction

Acknowledgements

Citation

← Metadata

Owner

Metadata

DeQA-Score DeQA-Score copied to clipboard

Metadata

Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution

Motivation

Model Architecture

[Installation Free!] Quicker Start with Hugging Face AutoModel

Installation

Quick Start

Image Quality Scorer

Training, Inference & Evaluation

Datasets

Pretrained Weights

Inference

Evaluation

Fine-tuning

LoRA Fine-tuning

Full Fine-tuning from the Scratch

Soft Label Construction

Acknowledgements

Citation

← Metadata

Owner

Metadata

DeQA-Score
DeQA-Score copied to clipboard