Issues icon indicating copy to clipboard operation
Issues copied to clipboard

Multiple deployments transferring the same package takes up to a minute longer

Open tothegills opened this issue 3 years ago • 9 comments

Team

  • [X] I've assigned a team label to this issue

Severity

Slowing down our own deployments

Version

All

Latest Version

I could reproduce the problem in the latest build

What happened?

When there are multiple deployments that deploy the same package to the same machine each package transfer takes a minute longer to complete.

Reproduction

Have many projects deploy the same package to the same machine. Run them all concurrently: image image image

Error and Stacktrace

No response

More Information

Our process-wide mutex on package acquisition retries every minute to acquire the mutex. If it's being held behind x other deployments, it will take x minutes to release.

Workaround

No response

tothegills avatar Oct 08 '21 06:10 tothegills

@OctopusDeploy/cpt have been impacted by this issue in our deployment process for Octofront. Is there any update on it / plan to resolve? Thanks

nathanwoctopusdeploy avatar Feb 09 '22 05:02 nathanwoctopusdeploy

@OctopusDeploy/team-octopus-cloud is also looking forward to having this fixed 🥺

flin-8 avatar Feb 10 '22 04:02 flin-8

Notice this happening for downloading too which is probably making things worse in conjunction with https://github.com/OctopusDeploy/Issues/issues/7018 image

N-lson avatar Feb 14 '22 23:02 N-lson

We don't think this is an issue anymore. The mutex polls regularly to check if a package is still being acquired. Is anyone still encountering this problem?

tothegills avatar Mar 31 '22 03:03 tothegills

@nathanwoctopusdeploy @flin-8 Is this still an issue for you?

ankithkonda avatar Apr 06 '22 02:04 ankithkonda

@nathanwoctopusdeploy @flin-8 Is this still an issue for you?

I don't notice it anymore, so it's not an issue for us even if it's still happening 😁

flin-8 avatar Apr 07 '22 00:04 flin-8

Another possible occurrence https://octopusdeploy.slack.com/archives/CNHBHV2BX/p1661301347672109

veochen-octopus avatar Aug 25 '22 02:08 veochen-octopus

It is still happening: https://octopusdeploy.slack.com/archives/CNHBHV2BX/p1661301347672109

tothegills avatar Sep 27 '22 00:09 tothegills

Our hubs are currently running 2022.2.8277, and this is still a problem for us. Usually our hub deployment tasks are naturally staggered, but there are times when we need to reprovision a large number of instances at once, such as when toggling a feature flag or when moving instances to a new cluster. In these scenarios, we will trigger a number of deployment tasks to happen at the same time on the hub, but this issue means that each subsequent task is delayed by a minute more than the previous one.

You can see an example of this in these server task logs from v2-hub-hwesteup00201. In this example, ServerTasks-89732, ServerTasks-89724 and ServerTasks-89715 were impacted the most, with delays of around 10-12 minutes each. These delays are considerable when the task itself can complete in as little as 7 minutes.

Tasks can also be impacted multiple times, depending on when they acquire packages. ServerTasks-89726 is an example where the first package acquistion went through quickly and then the second one was delayed.

There's a rough timeline view available to compare the timelines across the multiple tasks in ChromeTrace.json, which can be opened in the trace viewer available at chrome://tracing or edge://tracing.

This issue has come to our attention again because we're looking to improve the speed at which we can roll out changes, so we'd be keen to know when this is likely to be addressed.

x-cubed avatar Nov 15 '22 22:11 x-cubed

This is a very fast Worker that was running 80 tasks concurrently, and one of the deployments spent 29min waiting on (Another deployment is currently uploading package...)

image

I added some test logging and the mutex is working as expected. The problem is that the package staging takes 15-25secs, so with 80 tasks that amounts to 20-odd minutes because the execution is sequential.

I think we need to tweak the performance, and make checking package existence faster.

flin-8 avatar Dec 13 '22 06:12 flin-8

Also, it's common for tasks that have started early lose out to later tasks when contesting for the mutex, see how later tasks are taking a lot shorter and finishing earlier

image

flin-8 avatar Dec 13 '22 07:12 flin-8

See https://github.com/OctopusDeploy/Issues/issues/7957

N-lson avatar Dec 15 '22 01:12 N-lson