arcade
arcade copied to clipboard
User study feedback: Improve configuration information for test failures
Two things related to configuration information for test failures:
- Customers would prefer to see their test failures grouped by configuration because some configurations are more relevant to a user than others.
- Configuration information should be closer to the test failure header (not at the bottom where a user has to scroll down to discover, especially painful for data driven test results).
The problem is that if you break a test, we have to show 70+ configurations, which we don't have space for. And we'd have to repeat all the test details over and over (the callstack, for example), meaning now we'd probalby only be able to show a fraction of a single test failure. I don't think that's the right pivot in 99% of cases.
Sure, for a mac specific PR, that would be nice... but that's not what most PR's are.
/cc @BruceForstall @radical @ViktorHofer @mdh1418
One option could be that if the test fails in some small number (3???) configurations or less, then the configurations are shown. However, if not - then perhaps a header which shows a number of configurations failed. Basically, it'd be this idea of adaptive screens based on certain heuristics.
I think the users really want us to be smart. And say things like "only windows", or "never on windows", or "only mac". But there are a lot of windows queues, so even if it's "only windows", it ends up being like 10 queues.
Definitely want to do something adaptive based on the data we're getting. We may not be able to display all the tests that fail for a certain configuration, but if AzDO is telling us that over a certain number of tests fail, we can list the failing configurations, the number of failing tests in that configuration, and provide a link to the related test panel.
Did most of the users express a preference for "config" being the top pivot? I'm really worried we'll break the normal case to make an edge case better here.
My gut says most scenarios, configuration is not the primary pivot, the error message/stack is (since you need that to take action). If we promote configuration, we, by definition, need to demote something else.
Based on my discussions, it sounded like if we were displaying more test failures than we could fit on the page, the devs would have to dig into AzDO anyway, so it was preferrable not to show error messages/stack track information and take them right to AzDO. This seemed like a bigger pain point for Runtime, however, I would imagine for repos that don't have as many configurations/tests as Runtime, the number of failures would be a lot smaller and we could go with our current flow.
This is what AzDO's test panel looks like:
It has the configuration and the number of failing tests (out of total tests, but we could probably omit the total). All we'd have to do is link them to that.
However, based on other conversations, we should figure out how we'd want to handle a summary scenario when there are Known Issues. Due to throttling from AzDO, it would probably be unlikely that we'd be able to process all of the failing tests to determine which have Known Issues associated with them, but we could provide a link to the Known Issues GitHub project for the devs to use.
This also might tie back to this issue: https://github.com/dotnet/arcade/issues/10732 where we might want to show Known Issues in the AzDO test results themselves.
We can always link them to that, but I don't understand why we need to avoid showing them more information. At that point we aren't really providing any value... there is already a pretty easy way to get to that page, and it's already serving the purpose of "show me all the tests grouped by configuration". Why do we have to throw away the useful stuff we are doing to restate the same information presented in the same way?
This sort of feels like "that's what I'm used to seeing, show me what I already have"... but... they already have what they already have, and we aren't proposing taking that away from them. Maybe we need to rearrange things a bit to make it clearer you failed a TON of tests, but showing at least one test detail seems virtuous to me... in many of the "you failed 7000 tests", you've broken a single, pretty fundamental thing, and if you just fix the single test that's being displayed, you'll probably fix them all.
The summary suggestion is in the case of more tests are failing than we are able to show in the check. We should still show them the current flow in the event there aren't more than X number of tests failing. Some folks did find the stack trace and error messaging useful (some would have liked them to be collapsed, but still available).

Even if that case, I think it's still useful. It's... unlikely that a PR really has 7000 unique bugs. Even if 7000 tests are failing, it's probably only a single problem, because if you managed to create 7000 unique bugs with a single PR, you a master of bad coding.
So you don't need to look at all 7000 tests, just one. And if you've written good tests, you should be able to see the error message and callstack, and know what to do.
But if we just say, "meh, you failed to much", we remove the ability for people to have that good experience (and to write good, useful tests, with useful names and error messages, since what's the value, since nothing is going to show them anyway), and I'm not sure what we gain by removing the information.
Even in the example you showed them to me it's pretty clear what happened: just looking at the build analysis, and without any familiarity with the repo. It's pretty clear that something in service start up broke, because all processes are crashing out with exit code 134. The fact that 3 tests all did that and all have the same exception message is incredibly helpful. I don't care about the other 6997 tests, I'm going to assume that this horrible thing I did is the same in every case, I'm going to fix the one that was shown to me, put the commit, and assume they will all go away. If the don't, that's weird... if I somehow had exactly 2 bugs, one that happened to cause exactly 3 errors and they happened the be the 3 that we picked to analyze...
Chatted a bit about this one in our v-team sync. Here's some suggestions moving forward that hopefully will help with displaying too much information and presenting the information in a more user friendly way:
- Moving the configurations the test is failing on to the top of the test. This was a concern in the event that a test ends up failing on several configurations, requiring users to scroll to see the error information. We will make the list of configurations collapsible: https://github.com/dotnet/arcade/issues/10931
- Some folks didn't remark on wanting to see their tests grouped by configuration or wanting to see only a summary if there were more test failures than the page could display. However, since this was requested, my suggestion was to create another check, something like Build Analysis Test Summary, that contains that summary for folks who would appreciate that particular view.
I'm curious in what way the "Build Analysis Test Summary" would be better than the existing AzDO UI... Better enough that we'd want to spend all the time and complicate our scenario.
I still like the idea of collapsible sections with which ones being expanded by default being "smart". If folks are still wanting summary, then we can go from there. The summary ask strikes me as a implementation suggestion for the page being too busy and tough to decipher.
We'll move the configuration info to the top of the test that failed and collapse the list so that many configs aren't pushing the test failure data down.
PR: https://dnceng.visualstudio.com/internal/_git/dotnet-helix-service/pullrequest/27067
We just got the exact opposite request of this in the Teams channel, customers are complicated. :-) For what it's worth, I don't think the configuration information is the most important piece of a test, so I like it at the bottom too.
I think this is a good example of trying to look past the literal requests of users, and try to figure out why they are asking for what they want. Because we can't do two mutually exclusive things. But it's possible if we understand what the problem underlying it is, then we can address both of those scenarios well.
This rolled out. Closing!