arcade icon indicating copy to clipboard operation
arcade copied to clipboard

Display Test Known Issues for tests in non-PR pipelines

Open AndyAyersMS opened this issue 2 years ago • 16 comments

  • [ ] This issue is blocking
  • [ ] This issue is causing unreasonable pain

We often see recurring/repeated failures in outerloop and other non-PR jobs. Would be nice if build analysis applied here too, as there is often substantial manual effort involved in triaging the failures we see in these jobs.

cc @JulieLeeMSFT

AndyAyersMS avatar Sep 02 '22 19:09 AndyAyersMS

Hi @AndyAyersMS, since Build Analysis is tied to Pull Requests, how are you envisioning the workflow would work for having Build Analysis run against non-PR jobs?

missymessa avatar Sep 02 '22 20:09 missymessa

Any failure in a non-PR job is in some sense infrastructure failure or a pre-existing failure. The main thing that would be nice here is the ability to quickly see when these failures are known failures with open issues.

AndyAyersMS avatar Sep 02 '22 20:09 AndyAyersMS

I'll follow up with you offline to get more of an idea of how you're thinking this flow would work. :)

missymessa avatar Sep 02 '22 20:09 missymessa

They already are in there. If you go to one of the rolling builds, you can click here to get to the GitHub commit for the build: image

Then in the commit, you can click here: image

ChadNedzlek avatar Sep 02 '22 20:09 ChadNedzlek

If you are just looking at the commit history for main, it's a bit easier, you can click here to see the results: image

ChadNedzlek avatar Sep 02 '22 20:09 ChadNedzlek

Granted, GitHub's discoverability is so low there as to basically be zero, but we are running analysis on the rolling builds, and you can get there with 2 or 3 clicks... you just have to know where to click.

ChadNedzlek avatar Sep 02 '22 20:09 ChadNedzlek

Thanks @ChadNedzlek

How about for something like: https://dev.azure.com/dnceng/public/_build?definitionId=1000. For the recent runs one of the failures is in the GetTotalPauseDuration test.

There is an open issue for this https://github.com/dotnet/runtime/issues/74877 -- how can I quickly discover that the failures I see here are covered by that issue.

AndyAyersMS avatar Sep 02 '22 21:09 AndyAyersMS

We don't have any analysis that spans multiple builds if not using the "known issues" feature, which that issue doesn't appear to be. Build Analysis is focused on making specific PR's actionable. Investigating current issues isn't it's primary focus, unfortunately. Though there are still some options not strictly related to the build analysis check:

You could use the analytics page of Azure DevOps to look at every failure for any given test in the last 7/14/30 days: https://dev.azure.com/dnceng/public/_test/analytics?definitionId=1000&contextType=build

Or you could use our Kusto database with some of the queries documented here: https://github.com/dotnet/arcade/blob/main/Documentation/AzureDevOps/TestReportingQueries.md

ChadNedzlek avatar Sep 02 '22 21:09 ChadNedzlek

There's also the search test or build logs in the cloud feature.... (not sure the test search is turned on yet for /runtime though)

markwilkie avatar Sep 02 '22 21:09 markwilkie

Had a chat with Andy about this feature request.

Essentially, for non-PR builds, knowing if a test is failing due to a known issue is near impossible to discover from the AzDO UI. It would save hours of investigation time if we were able to provide a list of known test issues on these pipeline runs, either as another column on the test results:

image

or as another extension here:

image

missymessa avatar Sep 02 '22 21:09 missymessa

I'm not sure how we would accomplish such a thing. I think for overarching test analysis, the Kusto database is probably the best bet. It's possible we should be including some metadata about known issues in there (tests that fail due to known issues aren't able to be found/filtered out in those tables)

ChadNedzlek avatar Sep 02 '22 21:09 ChadNedzlek

Spitballing solutions:

  1. Output the Build Analysis content to some kind of extension that would show up on pipeline.
  2. If the test runs through Helix, Helix would look up failing tests to see if there was a known issue open to it already and link that data in the test results that would show up once it was posted on AzDO.

missymessa avatar Sep 02 '22 22:09 missymessa

We could try and add an extension into AzDO, but that's a fair bit of dev time and maintenance solution, since it can't be "GitHub flavored markdown", so we'd need an entirely separate rendering of it as pure HTML+CSS (or some other markdown engine, which will be expensive.

I don't think Azure DevOps is the right place for attempting to do this sort of build spanning analysis. The costs of us attempting to twist it into that are going to be high, and I think more flexible solutions (Kusto or the infamous "single landing page") are probably the right features to meet this scenario.

ChadNedzlek avatar Sep 02 '22 22:09 ChadNedzlek

/cc @radical

missymessa avatar Sep 06 '22 17:09 missymessa

since it can't be "GitHub flavored markdown"

Simple list of issues should be good enough. We do something similar using "Extensions" feature of AzDo. E.g. https://dev.azure.com/dnceng-public/public/_build/results?buildId=61917&view=ms.vss-build-web.run-extensions-tab

I think we already have data of these failures in same place from where "Build Analysis" gets its data.

kunalspathak avatar Oct 25 '22 18:10 kunalspathak

I wonder if this isn't mostly a report. It'd only work for pipelines that point to GH, but in talking with Andy, this seems sufficient.

markwilkie avatar Oct 26 '22 16:10 markwilkie