wpt.fyi icon indicating copy to clipboard operation
wpt.fyi copied to clipboard

Notify users when regressions are detected

Open foolip opened this issue 6 years ago • 9 comments

Related to https://github.com/web-platform-tests/wpt.fyi/issues/866

@beaufortfrancois has reported this recent regression in picture-in-picture: good: https://wpt.fyi/results/picture-in-picture?product=chrome[experimental,taskcluster]&sha=5dd605cdfa bad: https://wpt.fyi/results/picture-in-picture?product=chrome[experimental,taskcluster]&sha=c522b884f7

The full diff between those runs: https://wpt.fyi/results/?label=experimental&product=chrome%5Btaskcluster%5D%405dd605cdfa&product=chrome%5Btaskcluster%5D%40c522b884f7&diff

There are lots of regressions, introduced by https://github.com/web-platform-tests/wpt/pull/13966. We should have detected that before landing the changes, but if it had been contained to picture-in-picture we should still have detected the regression when it happened. In addition to changes in the wpt repo, infrastructure changes and browser releases can also cause regresssions.

@jgraham @gsnedders FYI

foolip avatar Dec 13 '18 09:12 foolip

Constructing the diff URLs in https://github.com/web-platform-tests/wpt/issues/14495 was quite time-consuming, so automatically comparing all pairs of master runs for a configuration would help immensely.

foolip avatar Dec 13 '18 09:12 foolip

Doing https://github.com/web-platform-tests/wpt/issues/13263 would mitigate the need for this somewhat, but I'd still love to have links to diffs for each master run, even if there aren't regressions.

foolip avatar Dec 13 '18 09:12 foolip

@foolip - the wpt.fyi custom check does compare each full master run to the most recent master run for the same product, and surfaces the regressions. It's noisy due to flakiness, but is that what this issue was asking for?

lukebjerring avatar Jan 16 '19 19:01 lukebjerring

Flakiness isn't what I wanted to discover, what I have in mind is a change to testharness.js or something in tools/ breaking a lot of tests. In other words, we'd need to compare two runs, somehow account for flakiness, and see if there are regressions outside of the affected tests. If there are, some human should be notified.

foolip avatar Jan 16 '19 19:01 foolip

OK - Then this isn't a wpt.fyi issue. wpt.fyi does detect regressions, but it gets ignored because of the flaky tests being noisy. The remaining work lies in fixing the flaky tests.

lukebjerring avatar Jan 16 '19 19:01 lukebjerring

Well, dealing with flakiness might be a blocker to make this useful, but there's still wpt.fyi work:

  • see if there are regressions outside of the affected tests (or just if there are very many regressions)
  • human should be notified

Currently, even if 1000 tests started to fail on master, we'd only notice it by the numbers or colors on wpt.fyi looking unfamiliar to us.

foolip avatar Jan 17 '19 01:01 foolip

Travis and Azure Pipelines can notify failures by email, maybe something like that could work? @lukebjerring you had some idea about notifications for other purposes already I think?

foolip avatar Jan 17 '19 14:01 foolip

I'm confused. The custom checks run for the full test suite comparison, on master for all commits. Example: https://github.com/web-platform-tests/wpt/runs/51187555 https://github.com/web-platform-tests/wpt/commits/master

Changing the title to reflect the change that you're asking for.

lukebjerring avatar Jan 17 '19 14:01 lukebjerring

Right, I know that the checks exist, but you have to go looking for them.

foolip avatar Jan 17 '19 14:01 foolip