f3d icon indicating copy to clipboard operation
f3d copied to clipboard

Change image comparison to PSNR

Open Meakk opened this issue 2 years ago • 10 comments
trafficstars

Meakk avatar Oct 13 '23 07:10 Meakk

You are modifying libf3d public API! :warning:Please update bindings accordingly:warning:! You can find them in their respective directories: python, java, webassembly.

github-actions[bot] avatar Oct 13 '23 07:10 github-actions[bot]

Codecov Report

Merging #1051 (3ceadb2) into master (07d6c6e) will decrease coverage by 0.03%. Report is 2 commits behind head on master. The diff coverage is 98.48%.

@@            Coverage Diff             @@
##           master    #1051      +/-   ##
==========================================
- Coverage   96.22%   96.19%   -0.03%     
==========================================
  Files         120      120              
  Lines        7015     7069      +54     
==========================================
+ Hits         6750     6800      +50     
- Misses        265      269       +4     
Files Coverage Δ
application/F3DStarter.cxx 97.83% <100.00%> (ø)
...VTKExtensions/Rendering/vtkF3DOpenGLGridMapper.cxx 99.00% <100.00%> (+0.03%) :arrow_up:
...y/VTKExtensions/Rendering/vtkF3DOpenGLGridMapper.h 100.00% <100.00%> (ø)
library/VTKExtensions/Rendering/vtkF3DRenderer.cxx 99.86% <100.00%> (+<0.01%) :arrow_up:
library/src/image.cxx 96.29% <97.95%> (-1.61%) :arrow_down:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Oct 13 '23 21:10 codecov[bot]

Im a bit confused as to why you could remove hight thresold for vtk comparison ? is PSNR less sensible than what VTK is doing ?

Looking at the code, PSNR seems a much simpler algorithm, maybe it is better but how does it change our testing, what makes this an improvement ?

mwestphal avatar Oct 14 '23 08:10 mwestphal

If Im not mistaken VTK is implementing a form of SSIM. Why do you thing PNSR is a better metric ?

mwestphal avatar Oct 14 '23 08:10 mwestphal

Honestly, VTK's algorithm is a bit confusing, I have no idea what it is currently doing, but it does not look like SSIM since it is supposed to return a value between 0 and 1.
Moreover, we had example in the past where VTK's comparison had failed to detect a rendering mismatch.
PSNR is quite standard (SSIM too I agree), simple to implement (I've check my implementation returns the same value than image magick), so it is returning a meaningful value for the user.
Moreover, I think the API is cleaner by splitting the metric computation and the difference image in two distinct APIs.
It happened that on our test base, a large majority of tests return an expected value (above 30, often above 50), and only some of them need a smaller threshold.

From this paper: https://projet.liris.cnrs.fr/imagine/pub/proceedings/ICPR-2010/data/4109c366.pdf

The study has revealed that the PSNR is more sensitive to additive Gaussian noise than the SSIM, while the opposite is observed for jpeg compression.

That explains why I had to lower the threshold for raytracing tests since I think raytracing introduced noise similar to gaussian noise.

In the end, I think it would be great to have different metrics, and choose it in the tests (use SSIM for raytracing maybe?) but I dislike VTK's metric.

Meakk avatar Oct 14 '23 21:10 Meakk

VTK's algorithm is a bit confusing,

Agreed

I have no idea what it is currently doing, but it does not look like SSIM since it is supposed to return a value between 0 and 1.

It may be related in the sense that it looks for similar pixel in a square of pixels, but it is definitely not a specific SSIM algorithm we can pin point.Keep in mind it was written probably before these concepts were fully created.

Moreover, we had example in the past where VTK's comparison had failed to detect a rendering mismatch.

It was when I broke it, but I fixed it since then.

I dislike VTK's metric.

Me too, and I also agree that having a standard metric would be much beneficial.

I also read that PSNR is specialized to evalauted the quality of lossy compression, which is not at all what we are working with.

At the end of the day, what matters to me is that our CI do not become less sensitive AND do not produce more false positive. In that objective, could you show image difference that required you to set a lower PSNR thresh ? Also Some image were very different and required a high VTK threshold, but it doesnt seem needed anymore, could you share such images that are not flagged as different by PSNR ?

mwestphal avatar Oct 15 '23 09:10 mwestphal

BTW Yohann is working on implementing proper SSIM in VTK

https://gitlab.kitware.com/vtk/vtk/-/merge_requests/10508

I tagged you on it but I should have discussed directly before you started on this I suppose

mwestphal avatar Oct 15 '23 09:10 mwestphal

At the end of the day, what matters to me is that our CI do not become less sensitive AND do not produce more false positive.

I agree, let me write a script to generate a table of generated+ref+diff+psnr for each test in order to evaluate PSNR properly.

Meakk avatar Oct 15 '23 18:10 Meakk

@Meakk with SSIM being added to VTK, I'm not sure we want to move forward on this one.

mwestphal avatar Nov 11 '23 11:11 mwestphal

Ok, let me know when everything is in place in VTK and we can discuss about the next steps. We can keep this PR opened for now.

Meakk avatar Nov 11 '23 12:11 Meakk