dependabot-core
dependabot-core copied to clipboard
Timeout running job when updating NuGet packages after updater logic was re-written in C#
Is there an existing issue for this?
- [X] I have searched the existing issues
Package ecosystem
NuGet
Package manager version
No response
Language version
No response
Manifest location and content before the Dependabot update
No response
dependabot.yml content
No response
Updated dependency
No response
What you expected to see, versus what you actually saw
Hi! Since the release of https://github.blog/changelog/2023-11-28-improvements-to-nuget-support-for-dependabot/ Dependabot stopped working in our non-tiny mono repository. Before that, it had been working for more than half a year.
proxy | 2024/03/26 05:40:57 proxy starting, commit: cf8623577dad71c128f219df2b27df6de35b909d
...
updater | 2024/03/26 05:41:06 INFO <job_805408216> Starting job processing
...
updater | file: Directory.Packages.props, metadata:
updater | name: Microsoft.Build.Traversal, version: 4.1.0
updater | file: Build.IntegrationTests.proj, metadata:
updater | 2024/03/26 05:48:58 INFO <job_805408216> Checking if prometheus-net 8.2.1 needs updating
updater | 2024/03/26 05:48:59 INFO <job_805408216> Filtered out 128 pre-release versions
proxy | 2024/03/26 05:48:59 [227] GET https://nuget.***.dev:443/nuget/Packages(Id='prometheus-net',Version='8.2.1')
proxy | 2024/03/26 05:48:59 [227] * authenticating nuget feed request (host: nuget.***.dev, basic auth)
proxy | 2024/03/26 05:48:59 [227] 404 https://nuget.***.dev:443/nuget/Packages(Id='prometheus-net',Version='8.2.1')
proxy | 2024/03/26 05:48:59 [229] GET https://api.nuget.org:443/v3-flatcontainer/prometheus-net/8.2.1/prometheus-net.nuspec
proxy | 2024/03/26 05:48:59 [229] 200 https://api.nuget.org:443/v3-flatcontainer/prometheus-net/8.2.1/prometheus-net.nuspec
proxy | 2024/03/26 05:49:57 Posting metrics to remote API endpoint
proxy | 2024/03/26 05:49:57 Successfully posted metrics data via api client
updater | time="2024-03-26T06:28:30Z" level=info msg="task complete" container_id=job-805408216-updater exit_code=137 job_id=805408216 step=updater
updater | time="2024-03-26T06:28:32Z" level=warning msg="timeout running job" error="waiting for updater: waiting for container: context deadline exceeded" job_id=805408216
What is going on between 05:49:57
and 06:28:32
?
Native package manager behavior
No response
Images of the diff or a link to the PR, issue, or logs
No response
Smallest manifest that reproduces the issue
No response
Today's log
...
proxy | 2024/04/16 11:11:12 [369] 200 https://api.nuget.org:443/v3-flatcontainer/system.security.cryptography.xml/index.json
proxy | 2024/04/16 11:12:02 Posting metrics to remote API endpoint
proxy | 2024/04/16 11:12:02 Successfully posted metrics data via api client
updater | time="2024-04-16T11:52:13Z" level=info msg="task complete" container_id=job-815472451-updater exit_code=137 job_id=815472451 step=updater
updater | time="2024-04-16T11:52:15Z" level=warning msg="timeout running job" error="waiting for updater: waiting for container: context deadline exceeded" job_id=815472451
Nothing between 11:12:02
and 11:52:15
...
@brettfo, hi! Do you have any ideas or know someone who can help?
@batkaevruslan Two questions for you:
- Can you share the full log? I want to see what steps have run already.
- Is this repo public, or if not, do you have a public repo where the same timeout occurs?
There's obviously something very slow happening and I'd like to experiment a bit to see where we could potentially speed things up. There are some instances, however, where there's not much we can do, e.g., if dependabot is running against a huge repo and trying to process all updates. The C# rewrite calls into the real NuGet and dotnet code which does much more checking, etc., and is certainly slower than the original code which was a sort of best-guess.
@brettfo ,
- https://pastebin.com/9vp33hNk
- it is private and I can't share it. I don't have a public repro.
Can you add some more logs to see what is going between 11:12:02 and 11:52:15 ?
@brettfo , I created a solution with one project. That project references almost all NuGet packages that we have in our product solution. Almost all, because I removed ~20 packages that are stored in private repositories. https://github.com/batkaevruslan/dependabotTimeoutRepro/blob/main/DependabotTimeoutRepro/DependabotTimeoutRepro.A/DependabotTimeoutRepro.A.csproj
And there was a timeout. There is also some unknown error, but I don't see it in logs...
P.S. the update log is so huge that GitHub fails to return it (50x errors) and if GitHub returns a response, Chrome fails to render it :) At the second attempt (816330767), Dependabot created a couple of PR, but there still was "Unknown error" https://github.com/batkaevruslan/dependabotTimeoutRepro/pulls
@batkaevruslan Thank you for this, I'll start experimenting.
@brettfo , hi! I've seen your pr https://github.com/dependabot/dependabot-core/pull/9566 and run Dependabot update on my public repo: at least, GH does not fail rendering the update log :) Timeouts are still here. Log show 5k+ requests to POST https://dc.services.visualstudio.com:443/v2/track
from 05:31:17
till 06:14:56
Can this tracking influence performance?
The telemetry calls shouldn't be the determining factor in the timeouts, but I'll keep experimenting with your repo.
For a quick follow-up, I removed the telemetry calls and started running an update on the sample repo provided and right now it's still going at 4 hours. There will certainly be more work to get the time down, but at some point we'll hit a wall because much of the work now is done either directly with the dotnet
CLI tool or indirectly through the NuGet APIs. The result is that it's slower, but (hopefully) more correct.
@brettfo, thanks for update! When we eventually hit the wall:
- Can you extend the timeout limits for NuGet updater, as they "are slow by definition"?
- Do you know what we can do on our side to fit the limit, except reducing the number of dependencies? Can we somehow change dependabot.yml to split update job?
@batkaevruslan I'm not aware of any setting to change the max run length, but given how long I've left this running locally with no end in sight, I don't think that's a viable option.
As for modifying the dependabot.yml, you should be able to split it up a bit by dependency name. I haven't tried this yet, but you might be able to do something like this:
version: 2
registries:
public-nuget:
type: nuget-feed
url: https://api.nuget.org/v3/index.json
updates:
- package-ecosystem: nuget
directory: "/"
registries:
- public-nuget
schedule:
interval: daily
time: "07:00"
timezone: "America/Los_Angeles"
open-pull-requests-limit: 15
groups:
# each group here should run as a separate update job
MicrosoftExtensions: # this group name is purely for you
patterns:
- "Microsoft.Extensions.*" # here we group a few dependencies at a time
FluentAssertions:
patterns:
- "FluentAssertions" # you can list multiple patterns
- "FluentAssertions.*"
# add any other groups you see fit
Just for history: using grouping does not help to avoid timeouts. My public repository does use grouping: https://github.com/batkaevruslan/dependabotTimeoutRepro/blob/main/.github/dependabot.yml#L16
An attempt to use several - package-ecosystem: "nuget"
sections with different allowed
packages and different schedules failed with
Update configs must have a unique combination of 'package-ecosystem', 'directory', and 'target-branch'
at some point we'll hit a wall because much of the work now is done either directly with the dotnet CLI tool or indirectly through the NuGet APIs. The result is that it's slower,
I'm also seeing this on a private repo and I'm curious about this comment. Did the old solution not have to call make similar API calls to determine out of date dependencies?
When I run dotnet list package --outdated --include-transitive
on the example solution's directory I get the list of packages in less than 30 seconds. I traced requests to api.nuget.org and I see about 300 requests to get the index and then paged data via GET /v3/registration5-gz-semver2/{package_name}/*
.
I know dependabot collects release notes and the commit diff. Is it this extra information that causes the wall to be hit?
@brettfo, hi! Any news on the issue?
I was wondering if our product solution is the largest in the world and the only one suffering from timeouts after this release...
- I forked eShopOnContaines (demo project from MS): https://github.com/batkaevruslan/eShop
- Ran Dependabot and got timeouts
@batkaevruslan This is something we're actively working on, but the short version is that we don't want to generate PRs that will fail the end user's CI and with the example you shared, updating the requested package resulted in one or more NU1605
/NU1608
warnings which indicates a conflict with the package versions specified. Currently the approach is an unsophisticated brute force search of trying to resolve all of those warnings and that is really slow. Our current work is trying to smartly trim down the package search space, but like all good bugs, fixing one issue surfaces 6 more.
updating the requested package resulted in one or more NU1605/NU1608
Looks like you are about my another issue https://github.com/dependabot/dependabot-core/issues/6710
Currently the approach is an unsophisticated brute force search of trying to resolve all of those warnings and that is really slow.
Is it possible to build a graph of all dependencies once and create PRs only for leaf level dependencies? Even if PR grouping feature of Dependabot won't work, it is better for us to have one PR per leaf dependency every day than no PRs at all.
Same issue here. After 60 minutes, Azure is killing our Dependabot run due to timeout and it is not possible with our tier to increase the timeout to more then 60 minutes. Also, I am not sure what Dependabot is doing: we see 20.000 lines of GET/POST log entries in our pipeline log.