azure-service-operator icon indicating copy to clipboard operation
azure-service-operator copied to clipboard

Allow targeting individual resources for specific Azure clouds (AzurePublicCloud, AzureUSGovernmentCloud, etc.)

Open nojnhuh opened this issue 2 years ago • 16 comments

Describe the current behavior Currently, ASO exposes a single azureResourceManagerEndpoint value in the Helm chart which applies to all resources managed by that ASO instance.

Describe the improvement ASO should expose a way to deploy individual resources to specific clouds.

Additional context Add any other context about the suggested improvement.

nojnhuh avatar Oct 19 '23 19:10 nojnhuh

A similar mechanism to per-resource clouds could also potentially apply to a particular namespace.

nojnhuh avatar Oct 19 '23 19:10 nojnhuh

What's the use case where a single cluster would be overseeing/managing resources across different clouds?

To this point, my expectation has been that a cluster managing resources in one of the non-public clouds would of necessity also be running within that cloud for security reasons. At first glance, crossing the security boundary between clouds seems as though it may be an issue.

theunrepentantgeek avatar Oct 19 '23 20:10 theunrepentantgeek

What's the use case where a single cluster would be overseeing/managing resources across different clouds?

I admit I don't know if this is actually what we need or if the main idea is only to make it possible to use different clouds at all. This seems to be the most relevant context I can find from CAPZ making that config local to each workload cluster and I don't see anything implying this was to enable managing resources in different clouds simultaneously: https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/1244.

Being able to specify this per-resource in ASO was mostly a selfish ask because that would likely be easiest to utilize in CAPZ without introducing breaking changes.

To this point, my expectation has been that a cluster managing resources in one of the non-public clouds would of necessity also be running within that cloud for security reasons. At first glance, crossing the security boundary between clouds seems as though it may be an issue.

This makes me wonder if there is perhaps a more sensible place to expose this in CAPZ.

@CecileRobertMichon Do you know if it's a requirement from CAPZ users or even possible to deploy workload clusters to different Azure clouds from a single management cluster?

nojnhuh avatar Oct 19 '23 21:10 nojnhuh

@nojnhuh, the initial issue in the CAPZ repo was there because we couldn't target Azure Gov clouds at all (without any hacks). The 1 controller:N clouds capability is not really that important to us either.

ionutleca avatar Oct 23 '23 16:10 ionutleca

I admit I don't know if this is actually what we need or if the main idea is only to make it possible to use different clouds at all.

I think ASO already has full support for the various available Azure clouds.

As you've noted, our helm chart has the requisite variables - and a review of config.go shows that we support the following environment variables:

  • AZURE_RESOURCE_MANAGER_ENDPOINT - ResourceManagerEndpoint is the Azure Resource Manager endpoint. If not specified, the default is the Public cloud resource manager endpoint.
  • AZURE_RESOURCE_MANAGER_AUDIENCE - ResourceManagerAudience is the Azure Resource Manager AAD audience. If not specified, the default is the Public cloud resource manager audience https://management.core.windows.net/.
  • AZURE_AUTHORITY_HOST - AzureAuthorityHost is the URL of the AAD authority. If not specified, the default is the AAD URL for the public cloud: https://login.microsoftonline.com/.

Maybe a similar approach (environment variables + public cloud defaults) would work for CAPZ?

theunrepentantgeek avatar Oct 23 '23 19:10 theunrepentantgeek

Maybe a similar approach (environment variables + public cloud defaults) would work for CAPZ?

That seemed to be the approach we took until a couple years ago with https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/1244. @CecileRobertMichon Is there still a requirement that Azure Cloud needs to be specified per-cluster?

nojnhuh avatar Oct 23 '23 19:10 nojnhuh

@nojnhuh today it's possible (before this regression) to deploy workload clusters in different clouds from the same management cluster with CAPZ. I don't know whether there are any users who actively use this feature, but regardless removing it would be a breaking change/regression. The reason it was initially removed from environment variables in the PR you linked above seems to be to not require envsubst to install CAPZ (CAPZ can now be installed without an Azure account / credentials) and all environment substitution happens at the workload cluster manifest apply step.

Seems like the current regression is that users can't create clusters in other clouds at all which is itself more critical than the multi-cloud aspect and we should focus on getting that fixed first though.

CecileRobertMichon avatar Oct 23 '23 20:10 CecileRobertMichon

@theunrepentantgeek Is it possible to change those environment variables in aso-controller-settings after ASO has started up and could we expect those to take effect while using per-resource secrets for credentials? That seems like a potentially reasonable workaround for now.

nojnhuh avatar Oct 23 '23 20:10 nojnhuh

@CecileRobertMichon @nojnhuh we are actually using multiple clouds from the same management plane. So, we would need this. 🙏

mjnovice avatar Nov 02 '23 15:11 mjnovice

@theunrepentantgeek Is it possible to change those environment variables in aso-controller-settings after ASO has started up and could we expect those to take effect while using per-resource secrets for credentials? That seems like a potentially reasonable workaround for now.

You can if you restart the ASO pod afterwards.

matthchr avatar Nov 03 '23 02:11 matthchr

We should do an ADR on this, describing how we could solve it (put cloud into credential?)

matthchr avatar Nov 20 '23 23:11 matthchr

We synced up with the CAPZ folks on this and determined the following:

  • We need an ADR (see above).
  • We need to check with CAPZ if per-namespace is good enough or if they also need per-resource. It seems likely we need per-resource, and in any case ASO currently doesn't use a different secret configuration for per-namespace or per-resource so from our side it might be harder to do only per-namespace if that's what we want.
  • CAPZ's opinion was, we should do this, but it's not urgent because there is a somewhat hacky workaround.

matthchr avatar Mar 25 '24 23:03 matthchr

Closing this, not planning on doing it for now. We can reopen of that changes.

matthchr avatar Nov 18 '24 21:11 matthchr

In prior versions of Capz we were able to deploy in different Azure clouds using the same management cluster. It makes sense for the management cluster not to cross cloud boundaries, particularly from Public to Nat (Gov, etc.), but does not hold true in the opposite direction.

Instead of managing two different clusters we can manage one management cluster with ASO running on it and leveraging resource or namespace level scope for fields like for individual resources

AZURE_RESOURCE_MANAGER_ENDPOINT 
AZURE_RESOURCE_MANAGER_AUDIENCE 
AZURE_AUTHORITY_HOST

shubhamrajvanshi avatar May 30 '25 23:05 shubhamrajvanshi

It's my understanding that security risks occur when different security contexts are mixed, regardless of the direction.

While reaching out to a lower security context from a high security context is less risky than the opposite, it's not risk free, and we have concerns about enabling cross-cloud support because of this.

theunrepentantgeek avatar Jun 02 '25 23:06 theunrepentantgeek

Thanks for raising this — you're right that mixing security contexts carries risk. However, supporting cross-cloud authentication with namespace and resource-level scoping doesn't inherently violate security principles if proper boundaries are maintained.

Ultimately, it's the responsibility of users to scope credentials appropriately and apply least privilege access. By enforcing these boundaries — for example, isolating credentials per namespace — the risks can be minimized while still enabling necessary flexibility for multi-cloud environments.

ASO should provide the mechanisms; users must define the policies that ensure secure usage.

shubhamrajvanshi avatar Jun 03 '25 04:06 shubhamrajvanshi

After further internal discussion, @matthchr and I see a way forward, with something like this:

Add ALLOW_MULTI_ENV_MANAGEMENT (or possibly ALLOW_MULTI_CLOUD_MANAGEMENT) to the global configuration of ASO. This would default to off.

When turned on, ALLOW_MULTI_ENV_MANAGEMENT would allow per-namespace and per-resource credentials to specify AZURE_RESOURCE_MANAGER_ENDPOINT, AZURE_RESOURCE_MANAGER_AUDIENCE, and AZURE_AUTHORITY_HOST to identify which cloud to use for that namespace/resource.

  • Inheritance/precedence would be a pain (and open to subtle security errors) so we're require ALL or NONE of these options to be present
  • If ALLOW_MULTI_ENV_MANAGEMENT is off, NONE of those options are permitted; reconciliation would fail.

No promises on timeframes, but we'll put this near the top of our priorities.

theunrepentantgeek avatar Jul 08 '25 00:07 theunrepentantgeek

@matthchr @theunrepentantgeek, willing to work on this, kindly let me know.

shubhamrajvanshi avatar Oct 04 '25 04:10 shubhamrajvanshi

@shubhamrajvanshi if this is something you're looking for, feel free to send us a PR and link this issue. AFAIK nobody on the ASO team has started work on this.

matthchr avatar Oct 06 '25 18:10 matthchr