ios-snapshot-test-case icon indicating copy to clipboard operation
ios-snapshot-test-case copied to clipboard

Test results depends on Mac device

Open navartis opened this issue 6 years ago • 21 comments

After updating MacOS up to Catalina 10.15 and Xcode up to 11.1 (11A1027)

I found that tests on other Mac device are failed with reference images saved an my Mac device.

Latest stable version 6.2.0 of library are used.

I spend 2 days trying to find the reason and fix it, implement "Highlight Different Pixels" feature and create request #108 which can help investigate such situations.

But now a time I have no ideas how to fix this issue.

Screenshot 2019-10-29 at 19 44 05 Screenshot 2019-10-29 at 19 44 41

Diffed Image vs Highlighted Pixels Failed Image_2_2028C7E3-9945-4726-9780-8FAE41926D36 Reference Image_1_2028C7E3-9945-4726-9780-8FAE41926D36

navartis avatar Oct 29 '19 16:10 navartis

Same happen to me, I record all images using Catalina, but running the same test in a Travis environment with macOS 10.14 It's failing for the diffs, as if they were shadows with different contrast

MaikCL avatar Nov 19 '19 17:11 MaikCL

Same issue here. Any idea about the reason?

mime29 avatar Dec 18 '19 04:12 mime29

It seems to me the reason is that Apple in Xcode 11x migrate Simulators processing from CPU to GPU.

GPU calculations faster but not accurate and it result may depends on hardware GPU venders and drivers.

I spend a lot of time, but the only working work around for me is using perPixelTolerance: 6/256 in func FBSnapshotVerifyView.

And this request #108

navartis avatar Dec 20 '19 11:12 navartis

It seems to me the reason is that Apple in Xcode 11x migrate Simulators processing from CPU to GPU.

GPU calculations faster but not accurate and it result may depends on hardware GPU venders and drivers.

I spend a lot of time, but the only working work around for me is using perPixelTolerance: 6/256 in func FBSnapshotVerifyView.

And this request #108

@navartis Why are you using the 6/256 number for perPixeltolerance?

luiz-brasil avatar Mar 16 '20 18:03 luiz-brasil

[Q] We didn't find a good solution on my side yet but we noticed one thing: The random diff result seems to come from views containing transparent assets (images with alpha channel not set to 1) Do you observe the same thing on your sides?

mime29 avatar Mar 19 '20 00:03 mime29

It seems to me the reason is that Apple in Xcode 11x migrate Simulators processing from CPU to GPU. GPU calculations faster but not accurate and it result may depends on hardware GPU venders and drivers. I spend a lot of time, but the only working work around for me is using perPixelTolerance: 6/256 in func FBSnapshotVerifyView. And this request #108

@navartis Why are you using the 6/256 number fo perPixeltolerance?

@luiz-brasil Unfortunately '6/256' is a magic number. I hate magic numbers in code, but in this case I have no ideas

navartis avatar Mar 19 '20 08:03 navartis

We're also seeing this issue. Interestingly it seems to only occur when using either usesDrawViewHierarchyInRect OR snapshotting during a UI test (not unit test).

Seems that it could be related to being rendered in a UIWindow.

I also got the same results between two Mac Pros (i.e. identical machines produce the same result).

Could this be graphics card related and is it time to raise a radar?

samskiter avatar Apr 08 '20 11:04 samskiter

I'm so glad that I stumbled across this thread, because I think I'm seeing the same thing.

About a week ago, I started noticing failures in some of my iOS Snapshot tests. These tests passed locally - both when run via Xcode and Fastlane - but failed when run on CI. I'm still trying to figure out which OS our build machine is using (though I suspect it might be 10.14.x), but it just so happens that I updated my machine to Catalina last week, and now I'm getting these failures.

I was able to capture the diff screenshots after running the tests on CI, and they're gray. I asked a designer to take one and bump up the contract and only with a massive bump in contrast were we able to see the slightest traces of some scree elements. (And I really had to squint to see them, even after the adjustment.)

Seems like the workaround at the moment is to bump up the tolerance. Has anyone found a different solution?

pigeon56 avatar Apr 24 '20 20:04 pigeon56

Just to confirm - is everyone else seeing that this not only happens between 10.14 and 10.15 (which is normal for any upgrade) but also between machines that are running the same version of macOS Catalina (10.15)?

Finally, has anyone got a simple test case that could be sent to Apple?

samskiter avatar May 05 '20 15:05 samskiter

@samskiter unfortunately I don't have a specific test case, primarily because these errors occur randomly.

I can affirm that the error occurs between different machines with the same version of Mac OS Catalina.

luiz-brasil avatar May 05 '20 15:05 luiz-brasil

@samskiter Results seem to be GPU dependent. In the simulator you can get different results by using the integrate or discrete GPUs File -> GPU Selection. I've done a lot of investigation and the only fix was to increase the tolerance of snapshots.

One of the Apple engineers that works on the simulator said the way forward is to increase pixel matching tolerance. Conversation on Twitter. Hope that helps.

gspiers avatar May 05 '20 16:05 gspiers

😢

navartis avatar May 05 '20 16:05 navartis

Wow, apple just broke an entire tool. Thanks Apple.

On Tue, 5 May 2020 at 17:31, navartis [email protected] wrote:

😢

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/uber/ios-snapshot-test-case/issues/109#issuecomment-624160214, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJWOW56ZLTVHHMZVZYTWC3RQA5MRANCNFSM4JGMB5YA .

samskiter avatar May 05 '20 20:05 samskiter

@samskiter Results seem to be GPU dependent. In the simulator you can get different results by using the integrate or discrete GPUs File -> GPU Selection. I've done a lot of investigation and the only fix was to increase the tolerance of snapshots.

One of the Apple engineers that works on the simulator said the way forward is to increase pixel matching tolerance. Conversation on Twitter. Hope that helps.

Because of this issue, what values are teams using for perPixelTolerance and overallTolerance?

JustinDSN avatar May 06 '20 00:05 JustinDSN

I'm running a little experiment on a few tests. I'll have to follow up tomorrow to let you know if it works for all of my tests, but so far I'm having luck with it.

For almost all screens: FBSnapshotVerifyView(UIImageView(image: croppedImage), identifier: identifier, perPixelTolerance: 6/256) // tried this number after seeing it higher up in this thread

This fixed most of my failures, but screens that use transparent elements still fail. For screens with transparent elements (where alpha < 1), this is actually passing for me: FBSnapshotVerifyView(UIImageView(image: croppedImage), identifier: identifier, overallTolerance: 0.02)

Again, these successes are based on a very small sampling of tests that are currently passing. Nearly all screens are being verified with the per pixel value, as we don't use a whole lot of transparency. I'll update this post tomorrow when I know whether or not it's a success.

pigeon56 avatar May 06 '20 00:05 pigeon56

I'm running a little experiment on a few tests. I'll have to follow up tomorrow to let you know if it works for all of my tests, but so far I'm having luck with it.

For almost all screens: FBSnapshotVerifyView(UIImageView(image: croppedImage), identifier: identifier, perPixelTolerance: 6/256) // tried this number after seeing it higher up in this thread

This fixed most of my failures, but screens that use transparent elements still fail. For screens with transparent elements (where alpha < 1), this is actually passing for me: FBSnapshotVerifyView(UIImageView(image: croppedImage), identifier: identifier, overallTolerance: 0.02)

Again, these successes are based on a very small sampling of tests that are currently passing. Nearly all screens are being verified with the per pixel value, as we don't use a whole lot of transparency. I'll update this post tomorrow when I know whether or not it's a success.

@JustinDSN Unfortunately, I'm going to have to spend more time on this until I know exactly which values are going to work for us.

The per pixel value of 6/256 is working for most screens, but the overall tolerance is giving me problems. It seems like screens that incorporate a transparency or use the fullscreen modal that was new to iOS 13 must be compared using overall tolerance. I've seen the tests pass locally with an overall tolerance value as low as 0.02 but when I run the tests on CI, even 0.25 isn't enough for one particular screen. (For the others, it's fine.) I'm trying a value of 0.30 on CI right now to see what happens. Even if that passes, it's way too high of a threshold to give me any sort of confidence that this is an effective and worthwhile comparison. (Update: It failed at 0.30 too. Just for that particular screen.)

To make things a little more fun, I do have a screen that definitely uses transparency and passes perfectly with a per pixel comparison. It's nearly identical to that one screen that I can't get a passing screenshot comparison on using per pixel or overall tolerance. (Mentioned in the previous paragraph.) Our CI build machine uses an older version of Xcode than I do - which isn't ideal, of course - so maybe that's part of it?

I'm still working with the same small sampling of tests. I'll scale this and update this thread again as soon as I figure out the floor of this overall tolerance threshold and apply it to more tests.

pigeon56 avatar May 06 '20 21:05 pigeon56

OOI does going up from 6/256 also fix your issuesd without having to use an overall tolerance?

samskiter avatar May 11 '20 12:05 samskiter

@samskiter It fixed the problem in some places, but not all. Also, my results are based on images that I took from my machine as the reference images. I don't know if the result would change if my colleague would have recently uploaded a reference image.

Despite some successes with the perPixel screen comparison, I'm just not confident in these tests anymore. I've already spent too much time trying to salvage them, and it's just not sustainable. This is a huge let down because they were really valuable to our organization. In fact, we caught a bug yesterday morning in a screen where the comparison is still working.

Short answer to your question: No. At the very least, screens where some form of transparency is used are definitely going to give you problems with a perPixel tolerance measurement.

pigeon56 avatar May 13 '20 20:05 pigeon56

That's frustrating. We also depend heavily on these!

samskiter avatar May 14 '20 11:05 samskiter

Were folks able to find any solution or workaround for this?

tirodkar avatar Jun 16 '20 23:06 tirodkar

No solution yet to this problem? just add perPixelTolerance: 6 / 256 where transparency, blur, etc. is used? This didn't work for me.

david6p2 avatar Aug 04 '21 18:08 david6p2