pulumi-azure
pulumi-azure copied to clipboard
ServicePrincipal eventual consistency issue when creating AKS cluster
When attempting to create an AKS cluster, the cluster will always error out on the first couple of runs because even though the new ServicePrincipal created successfully, IAM eventual consistency hasn't permuated all the way for the AKS cluster to read it successfully for use, resulting in a HTTP400:
azure:containerservice:KubernetesCluster (k8s-az-cluster):
error: Plan apply failed: Error creating Managed Kubernetes Cluster "k8s-az-cluster99144d8f" (Resource Group "k8s-az-cluster766bbb7b"): containerservice.ManagedClustersClient#CreateOrUpdate: Failure sending request: StatusCode=400 --
Original Error: Code="ServicePrincipalNotFound" Message="Service principal clientID: e6cba73e-5217-4a0b-a101-cc3430e96b8f not found in Active Directory tenant 706143bc-e1d4-
4593-aee2-c9dc60ab9be7, Please see https://aka.ms/aks-sp-help for more details."
Re-running a pulumi update once or twice after the initial error has surfaced gets around this issue, but it is a nuiscane.
A repro lives here for AKS: https://github.com/metral/demo-multicloud/blob/master/aks.ts
This is probably caused by https://github.com/terraform-providers/terraform-provider-azuread/issues/156 However, I don't think it always fails or the first time: I had the full update succeeding at the first go. Might depend on regions too.
Interesting, my attempts always fail on the first run in west-us2.
From that comment, w.r.t to the "Move identity code from cluster config stack to identity stack" of https://github.com/pulumi/docs/issues/1868
I have seen people add null resources/local exec with a sleep before, but that is far from ideal. Creating the SP & creds separately would most likely solve the issue for you, but that is also not ideal.
This sounds like it could be resolved when we split out the identity from the cluster into its own identity stack, and then stack ref it in the cluster, instead of all being provisioned together in the cluster stack as it is now.
Making the AD app a dependency on the cluster should work. You may try adding { dependsOn: [adApp , adSpPassword ]} to the cluster
@kenny-wealth I believe this doesn't always help: the SP is reported as completed but still not visible internally from the AKS service.
It's certainly not a fix, but worked for me, because previously the cluster and app started creating at the same time.