f3d
f3d copied to clipboard
Change image comparison to PSNR
You are modifying libf3d public API! :warning:Please update bindings accordingly:warning:!
You can find them in their respective directories: python, java, webassembly.
Codecov Report
Merging #1051 (3ceadb2) into master (07d6c6e) will decrease coverage by
0.03%. Report is 2 commits behind head on master. The diff coverage is98.48%.
@@ Coverage Diff @@
## master #1051 +/- ##
==========================================
- Coverage 96.22% 96.19% -0.03%
==========================================
Files 120 120
Lines 7015 7069 +54
==========================================
+ Hits 6750 6800 +50
- Misses 265 269 +4
| Files | Coverage Δ | |
|---|---|---|
| application/F3DStarter.cxx | 97.83% <100.00%> (ø) |
|
| ...VTKExtensions/Rendering/vtkF3DOpenGLGridMapper.cxx | 99.00% <100.00%> (+0.03%) |
:arrow_up: |
| ...y/VTKExtensions/Rendering/vtkF3DOpenGLGridMapper.h | 100.00% <100.00%> (ø) |
|
| library/VTKExtensions/Rendering/vtkF3DRenderer.cxx | 99.86% <100.00%> (+<0.01%) |
:arrow_up: |
| library/src/image.cxx | 96.29% <97.95%> (-1.61%) |
:arrow_down: |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
Im a bit confused as to why you could remove hight thresold for vtk comparison ? is PSNR less sensible than what VTK is doing ?
Looking at the code, PSNR seems a much simpler algorithm, maybe it is better but how does it change our testing, what makes this an improvement ?
If Im not mistaken VTK is implementing a form of SSIM. Why do you thing PNSR is a better metric ?
Honestly, VTK's algorithm is a bit confusing, I have no idea what it is currently doing, but it does not look like SSIM since it is supposed to return a value between 0 and 1.
Moreover, we had example in the past where VTK's comparison had failed to detect a rendering mismatch.
PSNR is quite standard (SSIM too I agree), simple to implement (I've check my implementation returns the same value than image magick), so it is returning a meaningful value for the user.
Moreover, I think the API is cleaner by splitting the metric computation and the difference image in two distinct APIs.
It happened that on our test base, a large majority of tests return an expected value (above 30, often above 50), and only some of them need a smaller threshold.
From this paper: https://projet.liris.cnrs.fr/imagine/pub/proceedings/ICPR-2010/data/4109c366.pdf
The study has revealed that the PSNR is more sensitive to additive Gaussian noise than the SSIM, while the opposite is observed for jpeg compression.
That explains why I had to lower the threshold for raytracing tests since I think raytracing introduced noise similar to gaussian noise.
In the end, I think it would be great to have different metrics, and choose it in the tests (use SSIM for raytracing maybe?) but I dislike VTK's metric.
VTK's algorithm is a bit confusing,
Agreed
I have no idea what it is currently doing, but it does not look like SSIM since it is supposed to return a value between 0 and 1.
It may be related in the sense that it looks for similar pixel in a square of pixels, but it is definitely not a specific SSIM algorithm we can pin point.Keep in mind it was written probably before these concepts were fully created.
Moreover, we had example in the past where VTK's comparison had failed to detect a rendering mismatch.
It was when I broke it, but I fixed it since then.
I dislike VTK's metric.
Me too, and I also agree that having a standard metric would be much beneficial.
I also read that PSNR is specialized to evalauted the quality of lossy compression, which is not at all what we are working with.
At the end of the day, what matters to me is that our CI do not become less sensitive AND do not produce more false positive. In that objective, could you show image difference that required you to set a lower PSNR thresh ? Also Some image were very different and required a high VTK threshold, but it doesnt seem needed anymore, could you share such images that are not flagged as different by PSNR ?
BTW Yohann is working on implementing proper SSIM in VTK
https://gitlab.kitware.com/vtk/vtk/-/merge_requests/10508
I tagged you on it but I should have discussed directly before you started on this I suppose
At the end of the day, what matters to me is that our CI do not become less sensitive AND do not produce more false positive.
I agree, let me write a script to generate a table of generated+ref+diff+psnr for each test in order to evaluate PSNR properly.
@Meakk with SSIM being added to VTK, I'm not sure we want to move forward on this one.
Ok, let me know when everything is in place in VTK and we can discuss about the next steps. We can keep this PR opened for now.