Issues
Issues copied to clipboard
Multiple deployments transferring the same package takes up to a minute longer
Team
- [X] I've assigned a team label to this issue
Severity
Slowing down our own deployments
Version
All
Latest Version
I could reproduce the problem in the latest build
What happened?
When there are multiple deployments that deploy the same package to the same machine each package transfer takes a minute longer to complete.
Reproduction
Have many projects deploy the same package to the same machine. Run them all concurrently:
Error and Stacktrace
No response
More Information
Our process-wide mutex on package acquisition retries every minute to acquire the mutex. If it's being held behind x other deployments, it will take x minutes to release.
Workaround
No response
@OctopusDeploy/cpt have been impacted by this issue in our deployment process for Octofront. Is there any update on it / plan to resolve? Thanks
@OctopusDeploy/team-octopus-cloud is also looking forward to having this fixed 🥺
Notice this happening for downloading too which is probably making things worse in conjunction with https://github.com/OctopusDeploy/Issues/issues/7018
We don't think this is an issue anymore. The mutex polls regularly to check if a package is still being acquired. Is anyone still encountering this problem?
@nathanwoctopusdeploy @flin-8 Is this still an issue for you?
@nathanwoctopusdeploy @flin-8 Is this still an issue for you?
I don't notice it anymore, so it's not an issue for us even if it's still happening 😁
Another possible occurrence https://octopusdeploy.slack.com/archives/CNHBHV2BX/p1661301347672109
It is still happening: https://octopusdeploy.slack.com/archives/CNHBHV2BX/p1661301347672109
Our hubs are currently running 2022.2.8277, and this is still a problem for us. Usually our hub deployment tasks are naturally staggered, but there are times when we need to reprovision a large number of instances at once, such as when toggling a feature flag or when moving instances to a new cluster. In these scenarios, we will trigger a number of deployment tasks to happen at the same time on the hub, but this issue means that each subsequent task is delayed by a minute more than the previous one.
You can see an example of this in these server task logs from v2-hub-hwesteup00201
. In this example, ServerTasks-89732
, ServerTasks-89724
and ServerTasks-89715
were impacted the most, with delays of around 10-12 minutes each. These delays are considerable when the task itself can complete in as little as 7 minutes.
Tasks can also be impacted multiple times, depending on when they acquire packages. ServerTasks-89726
is an example where the first package acquistion went through quickly and then the second one was delayed.
There's a rough timeline view available to compare the timelines across the multiple tasks in ChromeTrace.json, which can be opened in the trace viewer available at chrome://tracing
or edge://tracing
.
This issue has come to our attention again because we're looking to improve the speed at which we can roll out changes, so we'd be keen to know when this is likely to be addressed.
This is a very fast Worker that was running 80 tasks concurrently, and one of the deployments spent 29min waiting on (Another deployment is currently uploading package...
)
I added some test logging and the mutex is working as expected. The problem is that the package staging takes 15-25secs, so with 80 tasks that amounts to 20-odd minutes because the execution is sequential.
I think we need to tweak the performance, and make checking package existence faster.
Also, it's common for tasks that have started early lose out to later tasks when contesting for the mutex, see how later tasks are taking a lot shorter and finishing earlier
See https://github.com/OctopusDeploy/Issues/issues/7957