maestro icon indicating copy to clipboard operation
maestro copied to clipboard

[Feature Request] Visual Regression Testing

Open mdethlefs opened this issue 1 year ago • 30 comments

Is your feature request related to a problem? Please describe. It always takes a lot of time to check if certain components looks different/wrong. Tools like Chromatics visual regression testing help a lot with that. They take a screenshot of a component and with every build a new screenshots is taken and being compared against the old one.

Describe the solution you'd like

  • Maybe there should be a command like - assertVisual: xyz - which creates a screenshot the first time it is used. the xyz is a keyword so you can compare multiple screenshots and flows
  • When the command is used again maestro will check if a screenshot is already created and check if there are visual differences.
    • if there is a difference the flow fails.
    • you would have to manually check if the visual difference is ok or not
      • if it is ok and wanted you can accept the new screenshot as the news "master"-screenshot. flows will compare against that screenshot in the future

mdethlefs avatar Jul 05 '23 13:07 mdethlefs

+1

otoniel-isidoro avatar Jul 12 '23 20:07 otoniel-isidoro

+1

kassemitani avatar Oct 03 '23 12:10 kassemitani

+1

milesingrams avatar Oct 09 '23 18:10 milesingrams

+1

mrgklwong avatar Jan 04 '24 15:01 mrgklwong

+1

news-roccosalvetti avatar Jan 31 '24 10:01 news-roccosalvetti

+1

Rohphi avatar Feb 21 '24 08:02 Rohphi

+1

nabilfreeman avatar Apr 08 '24 13:04 nabilfreeman

+1

radhakrishnanakireddy avatar Apr 08 '24 17:04 radhakrishnanakireddy

+1

+1

paulsweeting avatar May 29 '24 16:05 paulsweeting

I would pay $50 a month for this feature

nabilfreeman avatar May 29 '24 16:05 nabilfreeman

+1

Stackustack avatar Jun 03 '24 17:06 Stackustack

Much of the logic for the most basic version of this feature already exists in the code base. Maestro does screenshot image comparison using a percentage change threshold in two places to power other features.

chriszs avatar Jul 02 '24 10:07 chriszs

Hey all! We're thinking about implementing this feature and we need to know what you want :)

All feedback, opinions, ideas, are very much appreciated.

New command - assertVisual

The assertVisual command requires a name argument. This name should be the name of the current screen. It takes a screenshot and compares it against a screenshot file in .maestro/reference_screenshot/<name>.png. If the screenshot doesn't exist, it saves it.

If assertVisual fails (i.e. the reference screeshot differs enough from the actual screenshot), it saves the ACTUAL screenshot to e.g. ~/.maestro/failed/<name> – that can later be downloaded for inspection by the user, and the reference screenshot can be easily updated.

- assertVisual:
    name: <name>
    threshold: <float in 0-1 range> <defaults to env var $MAESTRO_CLI_VISUAL_DIFF_THRESHOLD>

it can also be used in a short form:

- assertVisual: <name>

Example usage

appId: com.example.example
---
- launchApp
    clearState: true
- assertVisual: Login screen # compares the current screen against `.maestro/reference_screenshot/Login screen.png`
- tapOn: Sign in as guest
- assertVisual: Home screen # compares the current screen against `.maestro/reference_screenshot/Home screen.png`

Problems

  • How to make the experience of "accepting the change" usable/enjoyable on CI?

    Possible solution: Make it easy to upload "new" screenshots as artifacts on CI (e.g. using actions/upload). Users will download the "new" screenshots and manually update the old ones.

  • Need to exclude irrelevant system UI such as status bar

    Possible solution: get height of status bar and crop the image accordingly

  • Reference images take up a lot of space? Where to store them?

    Using Git LFS feels bad - GitHub's limit is for LFS data transfer is 1GB/user/month. Using an an external file storage (GCS, Amazon S3) seems like too much hassle.

bartekpacia avatar Aug 05 '24 11:08 bartekpacia

Love these ideas, I would love to add some snapshot testing to our workflows - the where to store/upload and then compare to is an interesting problem though! We started looking in to if there was any way we could generate screenshots throughout our tests, even just for showing other teams and members what flows looked like, but the screenshots only work locally and not on Cloud (for obvious reasons)

  • Is there any way we can integrate this with Cloud though, for teams who use it in CI?

My other initial thoughts:

  • excluding the status bar seems like a sensible approach
  • probably more of an education piece around guidelines on how the threshold will work with examples
  • presumably if this step fails, the test also fails and stops? This will be more of a workflow thing I guess, in that maybe you need separate dedicated snapshot tests vs the existing functional flows (unless your app is very stable and consistent)

simon-gilmurray avatar Aug 05 '24 11:08 simon-gilmurray

I wonder if converting the image to webp or another format would help with the size issue.

chriszs avatar Aug 05 '24 13:08 chriszs

Firstly: WOOO! This looks awesome!

Little stuff:

  • Worth considering a selector option for a partial screenshot?
  • I write tests for a React Native app that works and looks largely the same for both platforms. Is it worth trying to bake in the platform to the path or something, rather than requiring folk implementing tests to write the conditional logic everywhere?

Bigger stuff:

  • Tests should be runnable in cloud and locally. Is there stuff beyond status bars to consider to keep that capability? Some ability to create a better representation emulator/simulator? Or skipping the step if the local screen size doesn't match and there's an indicator for cloud use in the config.yaml? Lots of possibilities I've not thought of.

On your questions:

  • Implementing "Accepting the change" - what's missing from the current suggestion? You're already giving back the new golden, and can let folk manage their files as they need to let the next pass succeed. The real issue is in clear messaging. There's possibly a different workflow here between "Golden Missing" (and so give the Cloud user their new golden images) versus "Golden Failed" (where an intentionally changed UI causes a test to fail expectedly).
  • "Where to store the files" looks like it could balloon this feature to switching the Cloud model from the existing one of "an app file and a workspace" to "Real integration with the repositories". The description already suggests giving the user back their "failed" screenshots, so that they can reconcile. Maybe it's enough to let the user deal with storage? Or is this also about transfer bandwidth? When I maestro cloud blah blah I'd be transferring a lot more each time. Would it necessitate a change in pricing too?

Fishbowler avatar Aug 05 '24 20:08 Fishbowler

@bartekpacia That's great news, thanks a lot for considering this! 🤩

We have an app with dynamic content, so partial screenshot comparison would be much appreciated. Maybe this could at some point even lead to screenshot selectors, e.g. "click on the book cover looking like the image in this file".

One question for the screenshot taken if comparison with golden fails: will it also include diff markers (e.g. pixels which are different are tinted light red)? That would be super useful to quickly spot issues.

ubuntudroid avatar Aug 05 '24 22:08 ubuntudroid

hey, thanks a lot for all the feedback!

optional failures

presumably if this step fails, the test also fails and stops? This will be more of a workflow thing I guess, in that maybe you need separate dedicated snapshot tests vs the existing functional flows (unless your app is very stable and consistent)

assertVisual will have optional argument that accepts a bool.

Platform in path

I write tests for a React Native app that works and looks largely the same for both platforms. Is it worth trying to bake in the platform to the path or something, rather than requiring folk implementing tests to write the conditional logic everywhere?

@Fishbowler could you explain what you want in more detail?

Partial screenshots

I'm hesitant toward that, as it'll quickly increase the complexity of your tests, and make them more flaky and much less portable across different devices.

Diff markers

will it also include diff markers (e.g. pixels which are different are tinted light red)?

This is a very good idea (actually a necessary one to allow for a pleasant workflow)

bartekpacia avatar Aug 06 '24 17:08 bartekpacia

Regarding visual regression tests:

This is what the Chromatic user interface looks like: https://www.chromatic.com/videos/visual-test-hero.mp4

As you can see, there is also a toggle to display the visual differences more specifically. And there is a button to accept or reject the change. I think there will probably be no getting around a graphical user interface similar to that of Chromatic. Or rather, I have no idea how it could be solved differently.

By the way: If this feature gets implemented and it works as good as Chromatic it would be a complete game changer for many people. It would bring maestro to a whole new level

mdethlefs avatar Aug 06 '24 17:08 mdethlefs

Partial screenshots would be useful for isolating specific components.

chriszs avatar Aug 06 '24 23:08 chriszs

Platform in path

I'd like to be able to check that the login screen in Android looks like it did before. I'd like to be able to check that the login screen in iOS looks like it did before. They're close, but not close enough - Native components and whatnot. I don't want lots of conditional logic in my tests, I want assertVisual to take care of it all for me.

Partial Screenshots

I'd like to be able to care about what I care about and not need to always set consistent data to make the screenshots match (like data that would come from an API). Hierarchy for a selector would give good coordinates that would likely remain mostly consistent for the same app running on the same device.

Fishbowler avatar Aug 07 '24 07:08 Fishbowler

Hey all,

Thanks a lot for all the valuable feedback. It means a lot to us and helps us build what you need.

It's clear that this feature holds a lot of value. That said, there's an inherent problems with it: having to manually maintain the baseline. The larger the app, the more time consuming it'll get.

Proposal

TL:DR We want to give you advantages of assertVisual without the need to maintain the baseline screenshots.

Based on our experience building App Quality Copilot, we're quite confident that this can actually work and be useful.

Here's how we envision it:

- tapOn: Get started
- assertVisualAI:
    assertion: "Assert that login button is visible with multi factor authentication of OTP"
    optional: <bool> # if `true`, it'll be a warning, not a failure

assertVisualAI will not be the replacement of assertVisual. If what you want to do is compare screenshots pixel by pixel – sure, Maestro should let you do it, and we'll build this feature.

If you don't want the burden of maintaining baseline screenshots though, but still want some assurance that your screens "look right", we want to make it possible (and easy). In particular, assertVisualAI could catch the following categories of issues:

  • The assertion is false
  • text/views are cropped or overlapping
  • obvious localization probles

Actually, the prompt argument would be optional - you could just call assertVisualAI and still get the validations above.

We will also provide a way to improve AI responses by flagging false positives.

Maestro Studio integration

We'd like to surface responses you get from AI in Maestro Studio, to make experience smoother. Of course, at the same time we'd make sure it works equally well in CLI-only mode.

Model selection

We don't want to force any specific AI model. There'd be configuration in config.yaml so you could select between OpenAI's GPT 4o, GPT 4o-mini, Claude Sonnet, or even some locally running model

The future

We have many ideas around this. One of the is taking some existing model and finetuning it to perform even better for exactly this kind of task – quality assurance of mobile app UI.

Overall – what do you think? We'd love to get your thoughts on this.

bartekpacia avatar Aug 07 '24 12:08 bartekpacia

I think it's a cool option to have available, but right now I'd rather maintain my baseline and retain my deterministic tests.

Caveat: I've not played with the App Quality Copilot at all.

Slightly O/T:

My experience in testing AI systems (as opposed to using them to help me test) has given me a strong scepticism that the generative models can be relied upon for consistency of output, which in my current context (healthcare) means I can't rely on it for testing evidence.

Fishbowler avatar Aug 07 '24 15:08 Fishbowler

Something like this would be a great start from my perspective and perhaps it can use deeplinks or universal links so you can just specify a bunch of paths and getting screenshots.

https://docs.fastlane.tools/img/getting-started/ios/htmlPagePreviewFade.jpg

Can add more specific component testing after. Just would like to look at my overall layout on a bunch of different simulators first.

samducker avatar Aug 26 '24 18:08 samducker