IQA-PyTorch
IQA-PyTorch copied to clipboard
LIQE and LIQE-mix return different results for the same image when using a batch
I found this weird behavior in LIQE and LIQE-mix (did not test other metrics, so it could be present elsewhere as well):
import torch
import pyiqa
import matplotlib.cbook as cbook
import matplotlib.pyplot as plt
from torchvision import transforms
from PIL import Image
with cbook.get_sample_data('grace_hopper.jpg') as image_file:
image_path = image_file.name
# Open and display the image using PIL
image = Image.open(image_path)
one_image = transforms.ToTensor()(image).unsqueeze(dim=0)
two_images = torch.stack([transforms.ToTensor()(x) for x in [image, image]], dim=0)
five_images = torch.stack([transforms.ToTensor()(x) for x in [image] * 5], dim=0)
iqa_metric = pyiqa.create_metric("liqe_mix")
print(f"Result on file path: {iqa_metric(image_path)}")
print(f"Result on one tensor: {iqa_metric(one_image)}")
print(f"Result on two identical tensors: {iqa_metric(two_images)}")
print(f"Result on five identical tensors: {iqa_metric(five_images)}")
I would expect all results to be identical, instead I get:
Result on file path: tensor([4.9667], device='cuda:0')
Result on one tensor: tensor([4.9667], device='cuda:0')
Result on two identical tensors: tensor([4.9827, 4.9502], device='cuda:0')
Result on five identical tensors: tensor([4.9842, 4.9845, 4.9794, 4.9810, 4.9327], device='cuda:0')
so the same image repeated in the same batch gives different results, and none of them is identical to the result you get when you use only one image (either as a path, or as a batch of one). The problem gets worse the more images you have in one batch.
Thanks to your information. There is a bug in the following shape permutation code: https://github.com/chaofengc/IQA-PyTorch/blob/aac09d01a38f06cae0d9048c945b10e83bae9f21/pyiqa/archs/liqe_arch.py#L105
After unfold the shape should be (b, c, ph, pw, h, w) and permutation should be (0, 2, 3, 1, 4, 5) which makes (b, ph, pw, c, h, w).
Since the scores are calculated patch-wise, this bug only makes difference when inferring with batch size >1.
Similar bug is unlikely to happen in other metrics.