Gather publishing metrics
We should gather publishing metrics for each publish, so that we can track some key items which may affect performance:
- Number of files
- How big is the input file set
- How much data is actually sent (e.g. we may download all of PackageArtifacts and BlobArtifacts), but then publish all of PackageArtifacts to two locations, and all of BlobArtifacts to one location
- How long nuget pushes take
- How long file pushes take
@Chrisboh @chcosta I think the timeline data would be great to use for this, as we could plot it over time. @adiaaida said it looks like right now we're only tracking warning and errors though. We wouldn't want to use that for this. Is there another logging option we could use to get this into the Kusto DB?
Gentle ping. This looks quite interesting to have. Which epic would this be better on?
This feels like a good candidate for the epic @jcagme is driving.
@markwilkie your thoughts?
I'm not sure it fits in our epic, because it would be tracking things that happen in regular builds, rather than things that happen during release.
I would say it belonged in the PKPI epic, but we've all moved off of it.
Yeah, things happening during release, which will include, publishing to nuget.org and official feeds are already instrumented and we can measure how much each task took. I think the goal of this epic is to measure build to build publishing.
Is there already data on how long each "step" takes in each ring? Assuming so, is that sufficient @mmitche ?
This should already exist for all build in the 1ES Data Kusto tables. That's how we get data for the release pipeline
My understanding of this issue is that @mmitche created it when we were trying to figure out why official builds for various repos were taking so long when we were putting the data together for the town hall. Matt suggested it would be cool if we collected data during the slow running legs like using @chcosta's PipelineLogger functionalities, but logging them as info or something similar, instead of warning or error (the only 2 options now). Once we had that logging in, we would then use the normal methods we currently have to put it in Kusto, where we can query it and analyze it so we could figure out the best things to tackle when trying to drive the publishing steps down. Right now, the data that he's interested in isn't logged in any way, and we don't quite have the mechanism to log them (though we could just log it as warnings, and then use that).
My understanding of this issue is that @mmitche created it when we were trying to figure out why official builds for various repos were taking so long when we were putting the data together for the town hall. Matt suggested it would be cool if we collected data during the slow running legs like using @chcosta's PipelineLogger functionalities, but logging them as info or something similar, instead of warning or error (the only 2 options now). Once we had that logging in, we would then use the normal methods we currently have to put it in Kusto, where we can query it and analyze it so we could figure out the best things to tackle when trying to drive the publishing steps down. Right now, the data that he's interested in isn't logged in any way, and we don't quite have the mechanism to log them (though we could just log it as warnings, and then use that).
Yep. that's right.
I see. So the work is to perhaps add an "info" type of logging?
"Info" type logging isn't really supported by the timeline. The documentation kind of implies that it is, but it doesn't work. I've talked with AzDo quite a bit about it and was unable to inject any custom info message into the timeline. Additionally, when we previously went down the route of adding app insights type logging into the builds, we experienced significant build slowness due to network issues even though it was supposed to be a fire and forget type logging with minimal impact.
We probably already have "task" level metrics, it's just a question of figuring out how to interpret it in a way that is meaningful.
@mmitche What do you think should happen here?
Maybe the right thing to do is to simply print out info about the publishing at the end (see metrics from the top level issue description). When we see regressions based on the timeline data, we can either mine or inspect the metrics. It's not super elegant, but it would work and probably be fine for our purposes.