IQA-PyTorch icon indicating copy to clipboard operation
IQA-PyTorch copied to clipboard

[add?] new facial iqa score

Open abcnorio opened this issue 1 year ago • 7 comments

Hello,

last month the code for a new facial iqa score was published:

code: https://github.com/DSL-FIQA/DSL-FIQA

research paper: https://arxiv.org/pdf/2406.09622

Maybe it is worth to be added to iqa-pytorch.

abcnorio avatar Oct 22 '24 09:10 abcnorio

Thank you for the information. I'll look at this metric and consider adding it when I have the time.

chaofengc avatar Oct 22 '24 12:10 chaofengc

cool, it looks like it requires some effort, because before metric calculation some landscape marker calculation must take place (according to the research paper and also the github repo) if inference on custom data is the goal: https://github.com/DSL-FIQA/DSL-FIQA/tree/main/landmark_detection

abcnorio avatar Oct 22 '24 12:10 abcnorio

Actually, there is a similar metric in our toolbox, topiq_nr-face. You may use it currently.

chaofengc avatar Oct 22 '24 12:10 chaofengc

Great, thanks for pointing that out, will try that!

abcnorio avatar Oct 22 '24 12:10 abcnorio

topiq_nr-face -> tried that out, it breaks if there is no face visible, is it possible to avoid a break and just give out a warning along with some NA value? That way if a dataset has a least one photo without a face it just breaks and this requires a lot of manual work for bigger datasets. NA would be good enough to know "no face" and one can handle it like one handles NAs anyway.

abcnorio avatar Oct 24 '24 17:10 abcnorio

Please consider handling these cases manually in Python using try...except blocks. Returning NA may lead to incorrect or misleading error messages in reports.

chaofengc avatar Oct 25 '24 05:10 chaofengc

Thx, understood. Yes, handling NA is sometimes difficult for a lot of functions (coming from R some give respect to that, some do not, assumed that's the same with python). So the line (in the inference script)

score = iqa_model(img_path, ref_img_path).cpu().item()

will be the right choice to start, correct, because that's the point where the actual score calculation happens? Later it can be handled with 0 instead of NA and one can give out the filenames and numbers of images/ files that could not be calculated.

abcnorio avatar Oct 25 '24 06:10 abcnorio

I have evaluated the performance of topiq_nr-face on the DSL-FIQA test set. Despite being trained on GFIQA-20K, topiq_nr-face demonstrates results comparable to DSL-FIQA. Given that DSL-FIQA requires additional landmark detection, I believe its inclusion in our toolbox may be unnecessary.

However, I plan to re-train topiq_nr-face with the larger dataset proposed by DSL-FIQA to improve performance further.

Method PLCC SRCC
DSL-FIQA 0.9873 0.9880
topiq_nr-face (trained on GFIQA) 0.9641 0.9736

To reproduce these results, please first download the CGFIQA dataset as well as the meta information file from:

  • https://huggingface.co/datasets/chaofengc/IQA-PyTorch-Datasets
  • https://huggingface.co/datasets/chaofengc/IQA-PyTorch-Datasets-metainfo

Then use the following command:

python benchmark_results.py -m topiq_nr-face -d cgfiqa --use_gpu

chaofengc avatar Oct 27 '24 00:10 chaofengc

On Sat, 26 Oct 2024 17:43:13 -0700 Chaofeng Chen @.***> wrote:

Thanks - very valuable information. Before I can on our dataset it requires me to add the try...except add-on so it does not break for every non-face-detection image. It will give out a '0' or '1' for sucessfull application of the model to an image (possible breaks can be OOM as mentioned earlier) so it can be used not just for face-detection-failure but for general failure. So the output will contain that and can be used for further analysis.

Then I use some small script to evaluate the results (ok written in R but only because that's my origin), pls see attached example. It's highly reduced EDA but works quite well for me. Ignore the wrong caption at the bottom (it's raw filesize, not MB, and already fixed in the script). Maybe you can add a note on your page that certain analyses and graphs work pretty well to help with (pre-)selection of images (e.g. for upscaling) in line with visual inspection. One could also cluster in 3d, identify prototypical images for a dataset or create heatmaps along with content related categories, but all that requires a dataset with a limited amount of tags attached to images, not just pure scores. Most datasets won't be prepared in such a way.

best wishes

I have evaluated the performance of topiq_nr-face on the DSL_FIQA test set. Despite being trained on GFIQA-20K, topiq_nr-face demonstrates results comparable to DSL-FIQA. Given that DSL-FIQA requires additional landmark detection, I believe its inclusion in our toolbox may be unnecessary.

However, I plan to re-train topiq_nr-face with the larger dataset proposed by DSL_FIQA to improve performance further.

Method PLCC SRCC
DSL-FIQA 0.9873 0.9880
topiq_nr-face (trained on GFIQA) 0.9641 0.9736

To reproduce these results, please use the following command:

python benchmark_results.py -m topiq_nr-face -d cgfiqa --use_gpu

-- Reply to this email directly or view it on GitHub: https://github.com/chaofengc/IQA-PyTorch/issues/202#issuecomment-2439784097 You are receiving this because you authored the thread.

Message ID: @.***>

abcnorio avatar Oct 28 '24 10:10 abcnorio

For assistance with writing Python code, please consider reaching out to ChatGPT or Claude; they can provide the support you need.

For other issues, they may be outside the scope of this repository. Apologies that I can’t assist further on those matters.

chaofengc avatar Oct 28 '24 11:10 chaofengc