layercake Add Playwright chart screenshot tests

Work-in-progress, following up from #261

Apr 28 '25 09:04 rgieseke

Without any max pixel differences set we get the following errors in CI (screenshots generated locally on my machine):

/example-ssr/MultiLine 11 pixels (ratio 0.01 of all image pixels) /example-ssr/AreaStacked 516 pixels (ratio 0.01 of all image pixels /example/AreaStacked 274 pixels (ratio 0.01 of all image pixels)

Apr 29 '25 08:04 rgieseke

I was previously guessing that the ratio of pixels might be related to 'visible' (or non-background) pixels. That is not the case.

The ratio is calculated in Playwright here:

  const ratio = Math.ceil(count / (expected.width * expected.height) * 100) / 100;

https://github.com/microsoft/playwright/blob/1924b51d3f536a51cc2386c119ab922250dbf783/packages/playwright-core/src/server/utils/comparators.ts#L90

For both 11 and 561 pixels the formula gives 0.01 for an image size of 800x201:

> Math.ceil(11 / (800 * 201) * 100) / 100
0.01
> Math.ceil(516 / (800 * 201) * 100) / 100
0.01

Apr 29 '25 10:04 rgieseke

With a threshold of 0.25 the pixel differences are:

/example-ssr/MultiLine 11 pixels (ratio 0.01 of all image pixels) /example-ssr/AreaStacked 327 pixels (ratio 0.01 of all image pixels) /example/AreaStacked 90 pixels (ratio 0.01 of all image pixels)

Apr 29 '25 11:04 rgieseke

For a 0.3 threshold (Artifact Download):

/example-ssr/MultiLine 11 pixels (ratio 0.01 of all image pixels) /example-ssr/AreaStacked 272 pixels (ratio 0.01 of all image pixels) /example/AreaStacked 44 pixels (ratio 0.01 of all image pixels)

0.3 was suggested in this Playwright issue thread

Apr 29 '25 11:04 rgieseke

Trying Firefox, it has lots of failing screenshot comparisons (compared with locally generated ones). Many minor font differences.

May 04 '25 08:05 rgieseke

There are lots of discussions to use chrome flags to minimise rendering differences but this didn't change the differences (being set as half-pixel values or percentages with multiple digits, e.g. grid line position).

Here is an example discussion: https://stackoverflow.com/questions/79094715/disable-hardware-influence-on-playwright-tests-using-chromium-driver

Trying different headless modes or a newer version didn't solve the problem either.

May 04 '25 12:05 rgieseke

This is the next one on the list right?

May 13 '25 22:05 mhkeller

This is the next one on the list right?

Unfortunately not ready yet - I'll write more on this soon.

May 14 '25 17:05 rgieseke

Unfortunately not ready yet - I'll write more on this soon.

Currently, I am not sure if it's possible to have these test running in CI with a good developer experience, either one gets incorrectly reported test failures (like from slightly shifted axis labels or lines) or larger changes might be missed. I have not found a combination of pixelmatch threshold or relative or absolute pixel counts that works well.

For example, when I merge this into #281 to test the components I have to lower the pixelmatch threshold to 0.1 to get a failing test for a clearly off Sankey chart.

I think the very light color is considered too visually similar by the algorithm.

For other components with problems, like with CirclePack, the failure is detected, even with a higher threshold.

However, with a low threshold of 0.1 (or even 0) there are then all the failures reported where there is a half-pixel shift like for AreaStacked and others.

Having read a lot of threads on this and strategies tried, e.g. hiding some parts with CSS or people have tried to apply blurring to minimize the effect of half-pixel shifts or minor browser differences, I am not sure if this would cause too much friction.

It's annoying if PRs are failing due to unrelated changes and/or are failing locally and one has to click through to check. The workflow in a PR with having to download the report from an artifact URL is also cumbersome.

I still think that these screenshots can be extremely useful, especially when working on larger scale changes in Layer Cake.

I thought it might be useful to have the failing screenshots attached as a PR comment so that they could quickly be checked, but I'm not sure if it's possible to have this running without using an external service for hosting the images.

Another option might be to simply skip the tests when running in GitHub CI. They still would be very useful when running and developing locally.

It might also be worth adding a screenshot test for https://layercake.graphics/components or https://layercake.graphics as there all (?) charts on a single page. This would speed up the testing and one probably could still skim a diff quickly.

May 15 '25 12:05 rgieseke

Thanks for the write-up and for wrestling with it. That makes sense to me. Let's keep it as a gut-check to make sure that the svelte-5 migration doesn't introduce glaring errors but, it seems to brittle to introduce as a CI step.

May 15 '25 15:05 mhkeller

Should we wrap this up as a local test function and merge it?

May 23 '25 16:05 mhkeller

Should we wrap this up as a local test function and merge it?

I started looking into it, they need to go into a separate directory I think.

May 25 '25 20:05 rgieseke

I came across jest-image-snapshot and its Playwright port and saw their usage of glur for blurring to facilitate image comparison. They also use a quite low pixelmatch threshold.

This made me think about CSS options for blurring with Playwright and that was actually quite straightforward. With some blur applied all breaking pages from #281 get correctly detected with this PR. The minor anti-aliasing pixel shifts in axis tick labels or grid lines are blurred away.

So I think this might actually work in CI as well. Give it a spin on your machine, it might require further tweaking of the blur or threshold values.

May 26 '25 20:05 rgieseke

I've experimented with generating the chart screenshots in CI (if they are missing). This should minimize differences and would enable a workflow where one would delete the current screenshots if there are desired changes and they would automatically be generated.

I'm not sure if this works from a pull request (even with allowed edits), so I'd probably not directly include this in this PR, but this could be useful for tweaking the thresholds I think .

See https://github.com/rgieseke/layercake/pull/1

May 28 '25 11:05 rgieseke

I've cherry-picked the CI-generated screenshots into this PR. When I locally run the tests they pass, but fail with a lower threshold of 0.05, so the values are probably ok.

May 28 '25 11:05 rgieseke

Sorry for my slowness on this one. I checked it out and it looks good! I had to add this network idle check to get the download test to pass.

The first time I run it, they fail saying there no tests bc there is nothing committed for darwin. Probably best for me to commit darwin playwright screenshots. Otherwise I think this is good to merge!

Jul 12 '25 14:07 mhkeller

@rgieseke let me know if there's anything else you want to add to this one. Otherwise, we'll merge!

Jul 12 '25 15:07 mhkeller

I think once this is merged, I can finish the rest of the migration this week. I'd love to finish it before I'm set to give a tutorial on Friday.

Jul 13 '25 03:07 mhkeller

Just went through it. Looks good to me! We can tweak if need be.

Jul 13 '25 15:07 mhkeller

Just went through it. Looks good to me! We can tweak if need be.

Awesome, and "tweak if need be" would have been my reply!

Jul 13 '25 16:07 rgieseke