AzOps icon indicating copy to clipboard operation
AzOps copied to clipboard

AzOps - Discovery Performance Issues

Open reckitt-maciejglowacki opened this issue 4 years ago • 18 comments

AzOps - Pull pipeline of AzOps Accelerator run in Azure DevOps fails to grab information about all of the subscriptions and times out.

SPN has been given privileges over root management group with about 250 subscriptions. The build times out after 4 hours That's what I've set in the pipeline itself. I'm actually not sure if it does anything for that long (or just hangs midway) because log file is too large to browse it effectively.

Here's a screenshot: 2021-10-06_15h40_06

Has anyone experienced anything like that? Is this tool designed to handle so many subscriptions? Or maybe is it some problem with DevOps pool? Any help will be much appreciated.

reckitt-maciejglowacki avatar Oct 06 '21 13:10 reckitt-maciejglowacki

Hey @reckitt-maciejglowacki , We have customers with over 1000 subscription running this, so it should definitively work. Can you share how the below settings are configured in the settings.json file? image

daltondhcp avatar Oct 08 '21 11:10 daltondhcp

I'm using defaults from https://github.com/Azure/AzOps-Accelerator/blob/main/settings.json

2021-10-11_09h50_19

The only thing that I have changed is timeoutInMinutes in the pipeline itself.

reckitt-maciejglowacki avatar Oct 11 '21 07:10 reckitt-maciejglowacki

Thank you. For troubleshooting purposes, could you please try and change the Core.SkipResourceGroup setting to true and report back the results?

daltondhcp avatar Oct 11 '21 12:10 daltondhcp

That certainly helped :) The pipeline now runs just about 2 hours but it still fails due to #439

Are there any disadvantages to skipping rg discovery?

reckitt-maciejglowacki avatar Oct 12 '21 06:10 reckitt-maciejglowacki

You would only want RG discovery if you intend to do RG level deployments (like VMs or other resources) with AzOps, which I assume is not the intent here?

daltondhcp avatar Oct 12 '21 06:10 daltondhcp

It's not but we do want to be able to differentiate policy and role assignments between different resource groups.

reckitt-maciejglowacki avatar Oct 12 '21 07:10 reckitt-maciejglowacki

Are you going to manage that from a central platform perspective via AzOps or let the individual LZ teams do it?

daltondhcp avatar Oct 13 '21 12:10 daltondhcp

We're doing it centrally I'm afraid

reckitt-maciejglowacki avatar Oct 13 '21 12:10 reckitt-maciejglowacki

Understood. Can you try to change back the setting to discover RGs and change the pipeline timeout in ADO to 6 hrs and see if it completes successfully?

daltondhcp avatar Oct 13 '21 14:10 daltondhcp

Okay. I'll do that today and let you know the results.

reckitt-maciejglowacki avatar Oct 18 '21 09:10 reckitt-maciejglowacki

Same :(

2021-10-19_08h58_59

reckitt-maciejglowacki avatar Oct 19 '21 07:10 reckitt-maciejglowacki

Thank you for confirming this. We will take a look at this and see what we can do. The advise would be to disable resource group discovery for now.

daltondhcp avatar Oct 21 '21 07:10 daltondhcp

Hi @daltondhcp Just wanted to check have you managed to look into this issue? Thanks

reckitt-maciejglowacki avatar Nov 04 '21 12:11 reckitt-maciejglowacki

@daltondhcp bump

reckitt-maciejglowacki avatar Nov 16 '21 14:11 reckitt-maciejglowacki

Hey @reckitt-maciejglowacki - we are currently working on this, unfortunately no short term fix. I will make sure to keep the progress updated in this issue.

daltondhcp avatar Nov 17 '21 08:11 daltondhcp

Got it. Thanks for the info.

reckitt-maciejglowacki avatar Nov 18 '21 09:11 reckitt-maciejglowacki

Hi, any update on this?

reckitt-maciejglowacki avatar Dec 13 '21 11:12 reckitt-maciejglowacki

Hey @reckitt-maciejglowacki,

Unfortunately we are still investigating this but as a workaround for now you could use Self-hosted agents that have an unlimited run time as per: https://docs.microsoft.com/en-us/azure/devops/pipelines/process/phases?view=azure-devops&tabs=yaml#timeouts

Guidance on creating on self-hosted agents can be found here: https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/agents?view=azure-devops&tabs=browser#install

Hope that helps move you forward in the near term 👍

jtracey93 avatar Dec 14 '21 14:12 jtracey93

Hi. Just wanted to let you know that this update definitely hasn't fixed anything. Quite the opposite.

I'm getting various random errors when trying to execute this in an ADO pipeline. Even when it does run uninterupted (which seems completely random) it times out after an hour.

image

image

image

reckitt-maciejglowacki avatar Feb 21 '23 10:02 reckitt-maciejglowacki

Hi @reckitt-maciejglowacki, thanks for updating this issue and sharing.

I agree with your experience in regards to a bunch of different errors ultimately causing pipeline executions to fail.

We started seeing this as well once we release 2.0.0 into the wild and determined that majority of the different errors are due to the expanded usage of processing in parallel. When combined with an execution machine containing a "high" throttle limit and "low" amount of cores the errors starts to show a lot.

Our response to this was to implement logic in the module to detect these misalignments and override the throttle limit when detected. In addition to that we created a wiki for performance considerations.

Since release 2.0.2 improvements are included in AzOps module intended to resolve the behavior.

Could you confirm if you still have these issues on the latest release? (if yes, then lets re-open the issue).

Jefajers avatar Mar 17 '23 09:03 Jefajers

Thank you @Jefajers Latest update does seem to work. I haven't tried it on a resource level yet but it runs well for subscriptions and resource groups.

reckitt-maciejglowacki avatar Apr 25 '23 12:04 reckitt-maciejglowacki

Turns out my enthusiasm was premature..

image

reckitt-maciejglowacki avatar Apr 27 '23 11:04 reckitt-maciejglowacki

Can you share the details of the errors? Same as before or something else?

daltondhcp avatar Apr 27 '23 17:04 daltondhcp

The same I think:

image image image image

Those seem to appear above certain number of objects but I haven't drilled it down yet.

We're using AZOPS_MODULE_VERSION 2.1.2 and pretty much default settings.json from AzOps-Accelerator project with "Core.SkipResourceGroup": false

reckitt-maciejglowacki avatar Apr 28 '23 10:04 reckitt-maciejglowacki