distributed Way of identifying tests on the dashboard that have been fixed

Currently, the short test report dashboard shows 5 recently failing tests on main.

But 3 of those should already be fixed, and as you can see, they're all green in more recent runs:

https://github.com/dask/distributed/pull/6954
https://github.com/dask/distributed/pull/6955

It would be nice if there was a way for a PR to instruct the test dashboard that a test is expected to be fixed, and hide it immediately (as long as it's all green subsequent to that PR). Then, the dashboard would more accurately reflect the latest status of what tests are flaky. Or, if an expected-fixed test failed, it could highlight prominently that the fix didn't work.

Current dashboard, for reference:

cc @hendrikmakait @ian-r-rose @fjetter

Aug 29 '22 16:08 gjoseph92

Any suggestion how this is supposed to be implemented? Implementation efforts should stay within reason for a feature like this and I don't see a straightforward way of doing that

Aug 30 '22 07:08 fjetter

I'm not convinced the additional complexity to the chart generation is worth the effort here -- right now the chart is completely generable from the test result XML stored as artifacts. This would require an additional channel of information somewhere (I suppose from crawling PR names and matching via some magic string?). But if a flaky test is indeed fixed by a PR, it should already be removed from the chart in a week or less.

I also think there is a certain amount of satisfaction derived from looking at a field of green after a flaky test is fixed, at least for a few days :)

Aug 30 '22 15:08 ian-r-rose

suppose from crawling PR names and matching via some magic string

The action runs in the git repo with git history, so this seems doable:

git log --since="7 days ago" --format="%b" --grep="Fixes test" | sed -nr "s/^Fixes test (.+)$/\1/p"

When a PR gets merged, the person merging would have to manually add Fixes test distributed.tests.foo.bar to the commit description.

I'd also thought about a towncrier-like model, where you could add a file to a special fixed-tests directory or something naming the tests a PR fixes. That's a little more complicated though, probably not worth it.

Aug 30 '22 16:08 gjoseph92

When a PR gets merged, the person merging would have to manually add Fixes test distributed.tests.foo.bar to the commit description.

I think the data quality would be horrible. I doubt that implementing a dashboard change based on this would be worth our time.

If this is really a problem, what I could see us doing is to change the report generation logic a bit to something like

Only show test that have failures in the past X days
Show history for these failing tests for up to Y > X days

This way we'd filter out things like test_local_directory in the above example more quickly but would still get rich history for those tests that are still an issue.

I agree with Ian about the satisfaction of a green wall, though. I'm not entirely convinced yet that this is worth our time

Aug 31 '22 10:08 fjetter

I'm not convinced it's worth our time either. I think this is pretty low priority.

Aug 31 '22 22:08 gjoseph92

distributed distributed copied to clipboard

Way of identifying tests on the dashboard that have been fixed

distributed
distributed copied to clipboard