distributed
distributed copied to clipboard
Way of identifying tests on the dashboard that have been fixed
Currently, the short test report dashboard shows 5 recently failing tests on main.
But 3 of those should already be fixed, and as you can see, they're all green in more recent runs:
- https://github.com/dask/distributed/pull/6954
- https://github.com/dask/distributed/pull/6955
It would be nice if there was a way for a PR to instruct the test dashboard that a test is expected to be fixed, and hide it immediately (as long as it's all green subsequent to that PR). Then, the dashboard would more accurately reflect the latest status of what tests are flaky. Or, if an expected-fixed test failed, it could highlight prominently that the fix didn't work.
Current dashboard, for reference:

cc @hendrikmakait @ian-r-rose @fjetter
Any suggestion how this is supposed to be implemented? Implementation efforts should stay within reason for a feature like this and I don't see a straightforward way of doing that
I'm not convinced the additional complexity to the chart generation is worth the effort here -- right now the chart is completely generable from the test result XML stored as artifacts. This would require an additional channel of information somewhere (I suppose from crawling PR names and matching via some magic string?). But if a flaky test is indeed fixed by a PR, it should already be removed from the chart in a week or less.
I also think there is a certain amount of satisfaction derived from looking at a field of green after a flaky test is fixed, at least for a few days :)
suppose from crawling PR names and matching via some magic string
The action runs in the git repo with git history, so this seems doable:
git log --since="7 days ago" --format="%b" --grep="Fixes test" | sed -nr "s/^Fixes test (.+)$/\1/p"
When a PR gets merged, the person merging would have to manually add Fixes test distributed.tests.foo.bar to the commit description.
I'd also thought about a towncrier-like model, where you could add a file to a special fixed-tests directory or something naming the tests a PR fixes. That's a little more complicated though, probably not worth it.
When a PR gets merged, the person merging would have to manually add Fixes test distributed.tests.foo.bar to the commit description.
I think the data quality would be horrible. I doubt that implementing a dashboard change based on this would be worth our time.
If this is really a problem, what I could see us doing is to change the report generation logic a bit to something like
- Only show test that have failures in the past X days
- Show history for these failing tests for up to Y > X days
This way we'd filter out things like test_local_directory in the above example more quickly but would still get rich history for those tests that are still an issue.
I agree with Ian about the satisfaction of a green wall, though. I'm not entirely convinced yet that this is worth our time
I'm not convinced it's worth our time either. I think this is pretty low priority.