stride Improve visual parity tests by adding a threshold

Is your feature request related to a problem? Please describe. When there's subtle differences between platforms or graphics cards there may be subtle differences in generated images. I had a test fail because there was 1 pixel difference in the images (reference vs screenshot).

Describe the solution you'd like Add a threshold parameter in ImageTester.CompareImage. The simplest thing would be to just calculate a percentage of different pixels divided by all pixels in the image. This way we can allow 1-2% changes for different hardware before it needs manual inspection for differences.

I don't think there's much point in implementing a more complex comparison algorithm, but in case people are interested in the topic here's an interesting SO post

The threshold should be configurable per end user test class.

Aug 06 '22 19:08 manio143

I believe this specific test is to ensure rendering stays consistent between runs. Implementing a threshold would just sidestep the issue while reducing the test's usefulness. Rendering can be inconsistent across vendors, APIs, drivers or even driver settings - but it definitely should be consistent if none of those changes, given that, here's what I propose:

From a known good state, we compile the test scene as an executable.
Include this compiled executable to that test
Run that executable first when the test boots up
Retrieve the image it generated, this will be our reference image, it shouldn't change since the executable and assets are all precompiled.
the test continues -> it compiles the graphics stuff and renders the scene based on the changes the user made to his branch
We now compare the results, if it failed then the changes the user introduced to that branch messed with the rendering.

This is a lot more complex than what you propose but I do think it is more accurate to what that test is trying to do. If we can't do what I'm proposing I say it is better to leave that specific test to only run in the CI, that environment is completely static so it should always pass if the reference is set up right.

Aug 06 '22 20:08 Eideren

I think @manio143 already builds the images from master, effectively zeroing out differences between platform and graphics card. The issue I think he looks to address is that on my system all tests succeed, on his system they used to, but now on his system one test trips over just one pixel?

Aug 06 '22 20:08 ericwj

If that is the case then the test is not consistent run to run, which is what should be addressed I think

Aug 06 '22 20:08 Eideren

Where I'm coming from with this is thinking about end user scenarios - we should use high confidence thresholds like 0% for Stride's tests but it may not be necessary for all cases. I imagine a situation where a user wants to check some scene in their game works well by comparing screenshots after every meaningful action, but if they change a small detail in the scene it may not make sense to start failing those tests straight away. I guess it depends how often screenshots would break and how much value they would provide to the user.

We may want to see what is the spread of success rate of those tests on various hardware and try to pinpoint why are there changes happening.

Aug 06 '22 21:08 manio143

stride stride copied to clipboard

Improve visual parity tests by adding a threshold

stride
stride copied to clipboard