playwright feat: better support for visual regression testing

Playwright Test has a built-in toMatchSnapshot() method to power Visual Regression Testing (VRT).

However, VRT is still challenging due to variances in the host environments. There's a bunch of measures we can do right away to drastically improve experience in @playwright/test

[ ] support for docker test fixture to run browsers inside docker image.
[ ] support for blur in matching snapshot to counteract antialiasing
[x] better UI for reviewing snapshot diffs

Interesting context:

migration from backstopjs to @playwright/test

Aug 12 '21 10:08 aslushnikov

I think https://github.com/americanexpress/jest-image-snapshot provides a nice suite of options for various VRT scenarios. Test scenarios vary widely, depending on the context (testing components, whole pages, text-heavy or not, etc).

Besides bluring which helps a lot with antialiasing it would be nice if multiple image comparisons (e. g. SSM) would be possible. Alternative image comparison algorithms could be left to userland, if they can be plugged into toMatchSnapshot via a common interface.

Aug 12 '21 10:08 florianbepunkt

Besides bluring which helps a lot with antialiasing it would be nice if multiple image comparisons (e. g. SSM) would be possible.

@florianbepunkt What's SSM? Is it structural similarity measurement (SSIM)?

Aug 12 '21 14:08 aslushnikov

@aslushnikov Yes, typo.

Aug 12 '21 14:08 florianbepunkt

Solid integration with Storybook would be beneficial for the work I do. Chromatic and Percy do this really well.

Also a UI for reviewing the diffs would be great.

Aug 12 '21 14:08 kevinmpowell

Also a UI for reviewing the diffs would be great.

@kevinmpowell What's the one that you find most handy? Is it a "slider" diff like here:

Aug 12 '21 14:08 aslushnikov

I actually prefer the pixel highlighting (like Playwright already does), but organize all the failing tests in a UI so I can see what failed without having to poke around three different images.

Also being able to A/B toggle the baseline and the test image is nice in some cases.

Aug 12 '21 14:08 kevinmpowell

Slider is rarely useful for me. An onion-skin (transparency overlay) would be more useful.

Aug 12 '21 15:08 kevinmpowell

@aslushnikov Why toMatchSnapshot() is not available in the documentation? It can not be found in API list. And the article that was in 1.13 https://playwright.dev/python/docs/1.13.0/test-snapshots is not available for 1.14 anymore.

Thanks for thinking about Visual Regression testing. Thats important!

Aug 16 '21 15:08 AlexNetman

On a related note: It would be great if tests could be run cross-plattform. Currently the os platform name is baked into the snapshot filename, so our CI tests sometime fail due to name miss-match. https://github.com/microsoft/playwright/issues/7575

Aug 16 '21 19:08 florianbepunkt

support for blur in matching snapshot to counteract antialiasing

It would be nice if we could choose whether we want to apply such image filters before the snapshot is being saved or only when doing the comparison. I would prefer the first option as it keeps the diff small when creating new snapshots even of such images that change randomly / are flaky.

Aug 18 '21 15:08 lo1tuma

Please allow an auto-generated filename when toMatchSnapshot has no name input, similar to how toMatchSnapshot works in Jest.

[ ] Auto-gen filename when name not specified for toMatchSnapShot
[ ] Set default toMatchSnapshot file extension in playwright.config.ts

E.g.

// foo.spec.ts toMatchSnapshot() => foo.spec.ts.snap (default extension customizable in playwright.config.ts)

When you have a lot of screenshot assertions in one file, we can avoid writing a lot of filename inputs:

Aug 31 '21 10:08 ts-23

Thanks for thinking on this, blur feature is something that will help us, we have something similar before with puppeter that help us to do comparisson in animated pages, in addition to that something that can be really useful is be able to ignore specific parts of the screen, specially in those parts where we have more dynamic data(videos/images)

Sep 06 '21 07:09 sergioariveros

Blur would help us greatly. Also, the slider view would be incredible as well.

Nov 08 '21 21:11 Doug-Bowen

We're also really interested in these improvements. We had to disable visual tests for now because they are randomly failing because a few pixels are off, even when increasing the threshold. Blur should help here hopefully.

Nov 17 '21 10:11 z0n

I suggest solving biggest pain-point which is how to store this stuff in git repo so it doesn't blow up in size (to store only last snapshot). Git LFS kinda works but it's painful. Maybe something else would work better? For reference: https://github.com/americanexpress/jest-image-snapshot/issues/92

Would be great if these snapshot dirs were automatically marked in git to only store last revision.

Dec 06 '21 12:12 damaon

We're using Git LFS, what's your issue with it? Once we had it set up for everyone (we're using Mac, Windows and Linux), it worked fine. We're storing all images in the repo using Git LFS (*.png) so there's no work involved when adding snapshots to new tests either.

The only issue I have is comparing the image diffs in VS Code when committing new images as the old image is not shown in the diff view. The diff is working fine in the GitLab merge request view though so that's not a big issue.

Dec 06 '21 14:12 z0n

Hi @aslushnikov! This was pushed to the next version a few times now, could you please add this to the roadmap (if there is one?) so we can have a rough estimate on when this is coming?

I need to implement some visual tests soon™️ and it would be great if I wouldn't need another tool for that. I need to know if there will be improvements to this in 2 months or 2 years though.

Feb 04 '22 10:02 z0n

Hey @z0n, there's no roadmap. My guesstimate is that we'll have all the pieces together by summer 2022, the priority of VRT keeps raising.

Feb 04 '22 15:02 aslushnikov

We're using Git LFS, what's your issue with it?

It works for me but for example wanted to use it in one company that had poor infra and it didn't worked well with Jenskins for example, so I couldn't easily bypass it.

Also Git LFS worked weird with rebases and people had a lot of trouble with it when jumping between branches if I remember correctly.

It works but experience is suboptimal.

Feb 07 '22 17:02 damaon

Hey folks! Here's an update on screenshots and blurring.

I see lots of you requested a "blur" option to pre-blur images before comparison. While I imagine it can help with certain issues, it's a very big hammer, so I wonder if we can do a more delicate job.

I'd appreciate if you could share screenshots (actual / expected) that fail for you with regular diff, but pass with preblur. This way we'll have some real-world data to play with!

Feb 17 '22 05:02 aslushnikov

Many folks mentioned that they want pre-blur to avoid snapshot failures due to a few pixel differences.

A new options has landed on tip-of-tree: pixelCount and pixelRatio. These a supposed to help in these cases. Please give them a try and let me know, if you still need preblur!

$ npm i @playwright/test@next

Feb 18 '22 15:02 aslushnikov

Thank you for improving visual regression features, @aslushnikov!

You may find the implementation experience of gemini-testing project useful. Some pointers:

Several years ago we've great success using Gemini for visual regression testing. We used Gemini built-in web UI (either Gemini GUI or html-reporter - don't remember which) to choose changed images worth committing to Git. And during PR review we used built-in GitHub image diff. We had with very few false positives in image diffing. Unfortunately, false positives rate was not zero - mostly due to subtle browser timing/random fluctuations.

Gemini is deprecated now, replaced by Hermione, from the same authors. I haven't used it, but it seems to use the same approach for image diffing. The core is in looks-same and gemini-core libraries.

Feb 18 '22 18:02 shamrin

Thanks @shamrin for the pointers! I'll read your links in more details later to get a better understanding, but so far we already do all of these:

instead of using CIEDE2000, pixelmatch uses color difference in YIQ color space
pixelmatch uses the same algorithm based on the same whitepaper to ignore anti-aliasing
we hide text input caret on the browser level before making a screenshot

Feb 18 '22 19:02 aslushnikov

Hey! @aslushnikov I updated @florianbepunkt's original port of jest-image-snapshot to playwright test runner here: https://github.com/ayroblu/playwright-image-snapshot. Basically it looks VERY similar to playwright's existing golden.ts compare api and as you can see in matcher.ts.

The main benefit it is that it uses SSIM. I also updated how the diff is done so it's similar to pixelmatch's greyscale background which is super useful.

    expect(await page.screenshot()).toMatchImageSnapshot(test.info(), [
      name,
      "1-initial-load.png",
    ]);

Would love to have this SSIM option ported to playwright test as TestInfo is not exposed implicitly which makes the api usage a bit ugly. Made a PR #12258. I'm also hoping not to need to supply a file name by default, seems unnecessary.

Feb 20 '22 12:02 ayroblu

For the record: docker integration depends on global fixtures, so moving them forward.

Mar 28 '22 23:03 aslushnikov

Hi, @aslushnikov! Is it possible that in the next releases you will implement "slider" diff in the html report? There are cases where the slider is more convenient than the pixel highlighting method, especially when the length of the expected and actual screenshots differs.

It would be possible to implement one more tab in the report by analogy with Diff/Actual/Expected?

or you can display all 3 states on one tab in the report (as it looks in the attachments of this comment)

Mar 29 '22 11:03 bezyakina

@bezyakina not sure for 1.21 (we're about to finalize this version), but still possible! It all depends on how much our users need it.

So could you please file this separately to our bug tracker as a feature request? The more likes / upvotes it will collect, the higher priority will be for us, and the faster we'll implement it!

Mar 29 '22 20:03 aslushnikov

@bezyakina not sure for 1.21 (we're about to finalize this version), but still possible! It all depends on how much our users need it.

So could you please file this separately to our bug tracker as a feature request? The more likes / upvotes it will collect, the higher priority will be for us, and the faster we'll implement it!

thanks for your reply, created a new feature request - https://github.com/microsoft/playwright/issues/13176

Mar 30 '22 09:03 bezyakina

Hey there! Not sure if would be better to open another feature request, but https://github.com/jz-jess/RobotEyes has an interesting feature to ignore an array of UI elements in the image comparison, as these elements will be blurred, helping to achieve a higher percentage of fidelity (+95%) comparison. RobotEyes uses Imagemagick in the background which is a really powerful tool for image comparison. The idea is to ignore data elements from the screen before comparison is done. Taking that into account would require to set a different tolerance for each web page in the application, as each one can have different amount of UI elements with data. I've seen comments about blur, but it doesn't seem to be related to this... Thank you.

Apr 13 '22 13:04 AllanMedeiros

@AllanMedeiros you can use the mask api to mask elements on the screenshot. This should help!

Apr 13 '22 19:04 aslushnikov