AzOps AzOps - Discovery Performance Issues

AzOps - Pull pipeline of AzOps Accelerator run in Azure DevOps fails to grab information about all of the subscriptions and times out.

SPN has been given privileges over root management group with about 250 subscriptions. The build times out after 4 hours That's what I've set in the pipeline itself. I'm actually not sure if it does anything for that long (or just hangs midway) because log file is too large to browse it effectively.

Here's a screenshot: 2021-10-06_15h40_06

Has anyone experienced anything like that? Is this tool designed to handle so many subscriptions? Or maybe is it some problem with DevOps pool? Any help will be much appreciated.

Oct 06 '21 13:10 reckitt-maciejglowacki

Hey @reckitt-maciejglowacki , We have customers with over 1000 subscription running this, so it should definitively work. Can you share how the below settings are configured in the settings.json file?

Oct 08 '21 11:10 daltondhcp

I'm using defaults from https://github.com/Azure/AzOps-Accelerator/blob/main/settings.json

The only thing that I have changed is timeoutInMinutes in the pipeline itself.

Oct 11 '21 07:10 reckitt-maciejglowacki

Thank you. For troubleshooting purposes, could you please try and change the Core.SkipResourceGroup setting to true and report back the results?

Oct 11 '21 12:10 daltondhcp

That certainly helped :) The pipeline now runs just about 2 hours but it still fails due to #439

Are there any disadvantages to skipping rg discovery?

Oct 12 '21 06:10 reckitt-maciejglowacki

You would only want RG discovery if you intend to do RG level deployments (like VMs or other resources) with AzOps, which I assume is not the intent here?

Oct 12 '21 06:10 daltondhcp

It's not but we do want to be able to differentiate policy and role assignments between different resource groups.

Oct 12 '21 07:10 reckitt-maciejglowacki

Are you going to manage that from a central platform perspective via AzOps or let the individual LZ teams do it?

Oct 13 '21 12:10 daltondhcp

We're doing it centrally I'm afraid

Oct 13 '21 12:10 reckitt-maciejglowacki

Understood. Can you try to change back the setting to discover RGs and change the pipeline timeout in ADO to 6 hrs and see if it completes successfully?

Oct 13 '21 14:10 daltondhcp

Okay. I'll do that today and let you know the results.

Oct 18 '21 09:10 reckitt-maciejglowacki

Same :(

Oct 19 '21 07:10 reckitt-maciejglowacki

Thank you for confirming this. We will take a look at this and see what we can do. The advise would be to disable resource group discovery for now.

Oct 21 '21 07:10 daltondhcp

Hi @daltondhcp Just wanted to check have you managed to look into this issue? Thanks

Nov 04 '21 12:11 reckitt-maciejglowacki

@daltondhcp bump

Nov 16 '21 14:11 reckitt-maciejglowacki

Hey @reckitt-maciejglowacki - we are currently working on this, unfortunately no short term fix. I will make sure to keep the progress updated in this issue.

Nov 17 '21 08:11 daltondhcp

Got it. Thanks for the info.

Nov 18 '21 09:11 reckitt-maciejglowacki

Hi, any update on this?

Dec 13 '21 11:12 reckitt-maciejglowacki

Hey @reckitt-maciejglowacki,

Unfortunately we are still investigating this but as a workaround for now you could use Self-hosted agents that have an unlimited run time as per: https://docs.microsoft.com/en-us/azure/devops/pipelines/process/phases?view=azure-devops&tabs=yaml#timeouts

Guidance on creating on self-hosted agents can be found here: https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/agents?view=azure-devops&tabs=browser#install

Hope that helps move you forward in the near term 👍

Dec 14 '21 14:12 jtracey93

Hi. Just wanted to let you know that this update definitely hasn't fixed anything. Quite the opposite.

I'm getting various random errors when trying to execute this in an ADO pipeline. Even when it does run uninterupted (which seems completely random) it times out after an hour.

Feb 21 '23 10:02 reckitt-maciejglowacki

Hi @reckitt-maciejglowacki, thanks for updating this issue and sharing.

I agree with your experience in regards to a bunch of different errors ultimately causing pipeline executions to fail.

We started seeing this as well once we release 2.0.0 into the wild and determined that majority of the different errors are due to the expanded usage of processing in parallel. When combined with an execution machine containing a "high" throttle limit and "low" amount of cores the errors starts to show a lot.

Our response to this was to implement logic in the module to detect these misalignments and override the throttle limit when detected. In addition to that we created a wiki for performance considerations.

Since release 2.0.2 improvements are included in AzOps module intended to resolve the behavior.

Could you confirm if you still have these issues on the latest release? (if yes, then lets re-open the issue).

Mar 17 '23 09:03 Jefajers

Thank you @Jefajers Latest update does seem to work. I haven't tried it on a resource level yet but it runs well for subscriptions and resource groups.

Apr 25 '23 12:04 reckitt-maciejglowacki

Turns out my enthusiasm was premature..

Apr 27 '23 11:04 reckitt-maciejglowacki

Can you share the details of the errors? Same as before or something else?

Apr 27 '23 17:04 daltondhcp

The same I think:

Those seem to appear above certain number of objects but I haven't drilled it down yet.

We're using AZOPS_MODULE_VERSION 2.1.2 and pretty much default settings.json from AzOps-Accelerator project with "Core.SkipResourceGroup": false

Apr 28 '23 10:04 reckitt-maciejglowacki

AzOps AzOps copied to clipboard

AzOps - Discovery Performance Issues

AzOps
AzOps copied to clipboard