ScubaGear icon indicating copy to clipboard operation
ScubaGear copied to clipboard

prototype a performance redesign of the AAD provider

Open tkol2022 opened this issue 1 year ago • 4 comments

💡 Summary

Perform hands-on prototyping with multi-threading and other techniques to determine if it is possible to significantly improve the performance of the AAD provider with a redesign of specific code blocks.

Motivation and context

Currently the AAD provider is relatively slow because it has to fetch information about users and groups that are assigned to privileged roles and the current process invokes MS Graph API calls several times within multiple loops.

Implementation notes

Develop and test various alternative implementation techniques for the Get-PrivilegedUser and Get-PrivilegedRole functions. Perform profiling to compare executions times of alternative implementations versus the current code. Keep profiling data in a spreadsheet for reference.

Some potential alternative implementation technique that may be considered are listed below. This is not an exhaustive list - the assignee can experiment with techniques not listed here.

  • Using Powershell RunspacePool to perform concurrent Graph API requests
  • Using MS Graph JSON batching
  • Replacing the existing Powershell Cmdlets with direct calls to Graph API via Invoke-MgGraphRequest
  • any ideas not covered here

Acceptance criteria

  • [ ] All techniques have been developed and tested with their profiling data collected

tkol2022 avatar Jan 18 '24 22:01 tkol2022

To recap progress so far, I shifted (almost) all the provider export functions over to using Invoke-MgGraphRequest instead of the powershell Graph SDK commandlets.

The results make the situation pretty clear. Using functions from the graph SDK, the first run of the provider export takes about 3-4 minutes. Each run after that takes about 1:08 minutes. With substantially all the Graph SDK usage replaced with Invoke-MgGraphRequest and direct API access, the first run and subsequent runs both take about 1:08 minutes each. This data tells a very definite story that also aligns with our own observations. Namely that 60% of the initial run duration consists of slow Graph SDK module imports, which evidently occur the first time a given commandlet is used. Once these imports are complete, there is basically no advantage to using the graph API over the commandets.

twneale avatar Apr 01 '24 16:04 twneale

Furthermore, of the remaining approximately 1:08 minutes required to run the AAD provider export, at least two thirds of that is due to consistent server-side delays. These delays arise from endpoints that will not cause the same delays if commands are subsequently rerun in the terminal, yet their overall impact is consistent when the code is running, which suggests it might be a result of server-side rate limiting.

twneale avatar Apr 01 '24 17:04 twneale

To recap progress so far, I shifted (almost) all the provider export functions over to using Invoke-MgGraphRequest instead of the powershell Graph SDK commandlets.

The results make the situation pretty clear. Using functions from the graph SDK, the first run of the provider export takes about 3-4 minutes. Each run after that takes about 1:08 minutes. With substantially all the Graph SDK usage replaced with Invoke-MgGraphRequest and direct API access, the first run and subsequent runs both take about 1:08 minutes each. This data tells a very definite story that also aligns with our own observations. Namely that 60% of the initial run duration consists of slow Graph SDK module imports, which evidently occur the first time a given commandlet is used. Once these imports are complete, there is basically no advantage to using the graph API over the commandets.

Excellent analysis. What specific code changes or alternative solution do you recommend? If we don't call the APIs directly, is there a way to obviate the module load penalty?

tkol2022 avatar Apr 02 '24 14:04 tkol2022

Furthermore, of the remaining approximately 1:08 minutes required to run the AAD provider export, at least two thirds of that is due to consistent server-side delays. These delays arise from endpoints that will not cause the same delays if commands are subsequently rerun in the terminal, yet their overall impact is consistent when the code is running, which suggests it might be a result of server-side rate limiting.

Can you expand on what you mean by by server-side "delay"? Also, kindly expand on what you mean by "will not cause the same delays if subsequently run in the terminal? Regarding, server-side rate limiting, were you executing the APIs many times in succession over a given time period and noticed that Graph slowed the results?

tkol2022 avatar Apr 02 '24 14:04 tkol2022