external-dns icon indicating copy to clipboard operation
external-dns copied to clipboard

No azure identity found for request clientID using azure-private-dns provider

Open khauser opened this issue 1 year ago • 9 comments

What happened:

https://github.com/kubernetes-sigs/external-dns/blob/master/docs/tutorials/azure.md#managed-identity-using-aks-kubelet-identity I tried these commands adapted to a private dns zone.

PRINCIPAL_ID=$(az aks show --subscription $CLUSTER_SUBSCRIPTION --resource-group $CLUSTER_GROUP --name $CLUSTERNAME \
  --query "identityProfile.kubeletidentity.objectId" --output tsv)
IDENTITY_CLIENT_ID=$(az aks show --subscription $CLUSTER_SUBSCRIPTION --resource-group $CLUSTER_GROUP --name $CLUSTERNAME \
  --query "identityProfile.kubeletidentity.clientId" --output tsv)

AZURE_DNS_ZONE="<example.com>" # DNS zone name like example.com or sub.example.com
AZURE_DNS_ZONE_RESOURCE_GROUP="<ours>" # resource group where DNS zone is hosted
AZURE_DNS_ZONE_SUBSCRIPTION="<ours>"

# fetch DNS id used to grant access to the kubelet identity
DNS_ID=$(az network private-dns zone show --name $AZURE_DNS_ZONE \
  --subscription $AZURE_DNS_ZONE_SUBSCRIPTION --resource-group $AZURE_DNS_ZONE_RESOURCE_GROUP --query "id" --output tsv)

az role assignment create --role "Private DNS Zone Contributor" --assignee $PRINCIPAL_ID --scope $DNS_ID # also fails with IDENTITY_CLIENT_ID

The last command fails with: If the assignee is an appId, make sure the corresponding service principal is created with 'az ad sp create --id

But I was able to create the role assignment manually through the Azure Portal UI.

But then also the external-dns pod gives: no azure identity found for request clientID

What you expected to happen:

Both the azure role assignment and the external-dns pod should find the AKS assigned managed identity.

How to reproduce it (as minimally and precisely as possible):

external-dns configuration file "helm/external-dns-values.yaml":

extraVolumeMounts:
  - name: azure-config-file
    mountPath: /etc/kubernetes
    readOnly: true
extraVolumes:
  - name: azure-config-file
    secret:
      secretName: azure-config-file
sources:
  - ingress
  - service

External-dns installation script:

helm repo add external-dns https://kubernetes-sigs.github.io/external-dns
helm repo update

subscriptionId=$(az account show --subscription $AZURE_DNS_ZONE_SUBSCRIPTION --query "id" --output tsv)
tenantId=$(az account show --query "tenantId" --output tsv)

cat <<EOF >> azure.json
{
  "tenantId": "$tenantId",
  "subscriptionId": "$subscriptionId",
  "resourceGroup": "$AZURE_DNS_ZONE_RESOURCE_GROUP",
  "useManagedIdentityExtension": true,
  "userAssignedIdentityID": "<the kublet identity>"
}
EOF

namespace="external-dns"
if ! kubectl get namespace "$namespace" &> /dev/null; then
  kubectl create namespace "$namespace"
fi

secret_name="azure-config-file"
if kubectl get secret "$secret_name" -n "$namespace" &> /dev/null; then
  kubectl delete secret $secret_name -n "$namespace"
fi
kubectl create secret generic $secret_name -n external-dns --from-file azure.json


helm upgrade --install external-dns external-dns/external-dns --version 1.13.1 \
    --install \
    --wait \
    --namespace external-dns \
    --set provider=azure-private-dns \
    --set logLevel=debug \
    --values ./helm/external-dns-values.yaml \
    --create-namespace

Our aks.bicep with where the managed identity is created default wise:

param ppgId string
param aksName string
param aksVersion string
param aksVMSize string
param aksMaxScalingCount int
param vnetIPPrefix int

@secure()
param sshPublicKey string

var location = resourceGroup().location
var resourceGroupName = resourceGroup().name
var subscriptionId = subscription().subscriptionId

resource aksModule 'Microsoft.ContainerService/managedClusters@2023-07-01' = {
  name: aksName
  location: location
  identity: {
    type: 'SystemAssigned'
  }
  sku: {
    name: 'Base'
    tier: 'Standard'
  }
  properties: {
    kubernetesVersion: aksVersion
    dnsPrefix: aksName
    addonProfiles: {
      azureKeyvaultSecretsProvider: {
        enabled: true
        config:{
          enableSecretRotation: 'true'
        }
      }
    }
    agentPoolProfiles: [
      {
        name: 'agentpool'
        vmSize: 'Standard_DS2_v2' // just a small machine for the "unneeded" machine
        osDiskSizeGB: 30
        count: 1
        osType: 'Linux'
        mode: 'System'
        vnetSubnetID: resourceId('Microsoft.Network/virtualNetworks/subnets', '${aksName}-vnet', 'default')
        type: 'VirtualMachineScaleSets'
        orchestratorVersion: aksVersion
        enableNodePublicIP: false
        proximityPlacementGroupID: ppgId
        enableAutoScaling: true
        minCount: 1
        maxCount: 1
      }
      {
        name: 'agentpool2'
        vmSize: aksVMSize
        osDiskSizeGB: 128
        count: 0
        osType: 'Linux'
        mode: 'User'
        vnetSubnetID: resourceId('Microsoft.Network/virtualNetworks/subnets', '${aksName}-vnet', 'default')
        type: 'VirtualMachineScaleSets'
        orchestratorVersion: aksVersion
        enableNodePublicIP: false
        maxPods: 35
        enableAutoScaling: true
        minCount: 0
        maxCount: aksMaxScalingCount
      }
    ]
    autoScalerProfile: {
      'scale-down-utilization-threshold': '0.20'
    }
    networkProfile: {
      loadBalancerSku: 'standard'
      networkPlugin: 'azure'
      serviceCidr: '10.208.${vnetIPPrefix}.0/20'
      dnsServiceIP: '10.208.${vnetIPPrefix}.10'
      dockerBridgeCidr: '173.17.0.1/16'
    }
    linuxProfile: {
      adminUsername: 'localhorst'
      ssh: {
        publicKeys: [
          {
            keyData: sshPublicKey
          }
        ]
      }
    }
    enableRBAC: true
  }
  dependsOn: [
    aksName_vnet
  ]
}

resource aksName_vnet 'Microsoft.Network/virtualNetworks@2020-06-01' = {
  name: '${aksName}-vnet'
  location: location
  properties: {
    subnets: [
      {
        name: 'default'
        id: '/subscriptions/${subscriptionId}/resourceGroups/${resourceGroupName}/providers/Microsoft.Network/virtualNetworks/${aksName}-vnet/subnets/default'
        properties: {
          addressPrefix: '10.208.${(vnetIPPrefix + 16)}.0/20'
          //10.208.16.0 - 10.208.16.255 prod
          //10.208.48.0 - 10.208.48.255 test
        }
      }
    ]
    addressSpace: {
      addressPrefixes: [
        '10.208.${vnetIPPrefix}.0/19'
        //10.208.0.0 - 10.208.31.255 prod
        //10.208.32.0 - 10.208.63.255 test
      ]
    }
  }
  tags: {}
}

output agentPoolIdentity object = aksModule.properties.identityProfile.kubeletidentity
output agentPoolIdentityObjectId string = aksModule.properties.identityProfile.kubeletidentity.objectId
output aksKeyVaultProviderIdentity object = aksModule.properties.addonProfiles.azureKeyvaultSecretsProvider.identity
output aksKeyVaultProviderIdentityObjectId string = aksModule.properties.addonProfiles.azureKeyvaultSecretsProvider.identity.objectId

Anything else we need to know?: Please ask if you need more information.

Environment: Azure AKS 1.27.3

  • External-DNS version (use external-dns --version): v0.13.4
  • DNS provider: azure-private-dns

khauser avatar Jan 03 '24 13:01 khauser

I now also assigned "Reader" to the underlying resource group of the private dns zone but still external-dns is failing:

time="2024-01-05T09:31:50Z" level=info msg="config: {APIServerURL: KubeConfig: RequestTimeout:30s DefaultTargets:[] GlooNamespaces:[gloo-system] SkipperRouteGroupVersion:zalando.org/v1 Sources:[ingress service] Namespace: AnnotationFilter: LabelFilter: IngressClassNames:[] FQDNTemplate: CombineFQDNAndAnnotation:false IgnoreHostnameAnnotation:false IgnoreIngressTLSSpec:false IgnoreIngressRulesSpec:false GatewayNamespace: GatewayLabelFilter: Compatibility: PublishInternal:false PublishHostIP:false AlwaysPublishNotReadyAddresses:false ConnectorSourceServer:localhost:8080 Provider:azure-private-dns GoogleProject: GoogleBatchChangeSize:1000 GoogleBatchChangeInterval:1s GoogleZoneVisibility: DomainFilter:[test.intershop.com] ExcludeDomains:[] RegexDomainFilter: RegexDomainExclusion: ZoneNameFilter:[] ZoneIDFilter:[] TargetNetFilter:[] ExcludeTargetNets:[] AlibabaCloudConfigFile:/etc/kubernetes/alibaba-cloud.json AlibabaCloudZoneType: AWSZoneType: AWSZoneTagFilter:[] AWSAssumeRole: AWSAssumeRoleExternalID: AWSBatchChangeSize:1000 AWSBatchChangeInterval:1s AWSEvaluateTargetHealth:true AWSAPIRetries:3 AWSPreferCNAME:false AWSZoneCacheDuration:0s AWSSDServiceCleanup:false AWSDynamoDBRegion: AWSDynamoDBTable:external-dns AzureConfigFile:/etc/kubernetes/azure.json AzureResourceGroup: AzureSubscriptionID: AzureUserAssignedIdentityClientID: BluecatDNSConfiguration: BluecatConfigFile:/etc/kubernetes/bluecat.json BluecatDNSView: BluecatGatewayHost: BluecatRootZone: BluecatDNSServerName: BluecatDNSDeployType:no-deploy BluecatSkipTLSVerify:false CloudflareProxied:false CloudflareDNSRecordsPerPage:100 CoreDNSPrefix:/skydns/ RcodezeroTXTEncrypt:false AkamaiServiceConsumerDomain: AkamaiClientToken: AkamaiClientSecret: AkamaiAccessToken: AkamaiEdgercPath: AkamaiEdgercSection: InfobloxGridHost: InfobloxWapiPort:443 InfobloxWapiUsername:admin InfobloxWapiPassword: InfobloxWapiVersion:2.3.1 InfobloxSSLVerify:true InfobloxView: InfobloxMaxResults:0 InfobloxFQDNRegEx: InfobloxNameRegEx: InfobloxCreatePTR:false InfobloxCacheDuration:0 DynCustomerName: DynUsername: DynPassword: DynMinTTLSeconds:0 OCIConfigFile:/etc/kubernetes/oci.yaml OCICompartmentOCID: OCIAuthInstancePrincipal:false InMemoryZones:[] OVHEndpoint:ovh-eu OVHApiRateLimit:20 PDNSServer:http://localhost:8081 PDNSAPIKey: PDNSSkipTLSVerify:false TLSCA: TLSClientCert: TLSClientCertKey: Policy:upsert-only Registry:txt TXTOwnerID:default TXTPrefix: TXTSuffix: TXTEncryptEnabled:false TXTEncryptAESKey: Interval:1m0s MinEventSyncInterval:5s Once:false DryRun:false UpdateEvents:false LogFormat:text MetricsAddress::7979 LogLevel:debug TXTCacheInterval:0s TXTWildcardReplacement: ExoscaleEndpoint: ExoscaleAPIKey: ExoscaleAPISecret: ExoscaleAPIEnvironment:api ExoscaleAPIZone:ch-gva-2 CRDSourceAPIVersion:externaldns.k8s.io/v1alpha1 CRDSourceKind:DNSEndpoint ServiceTypeFilter:[] CFAPIEndpoint: CFUsername: CFPassword: ResolveServiceLoadBalancerHostname:false RFC2136Host: RFC2136Port:0 RFC2136Zone: RFC2136Insecure:false RFC2136GSSTSIG:false RFC2136KerberosRealm: RFC2136KerberosUsername: RFC2136KerberosPassword: RFC2136TSIGKeyName: RFC2136TSIGSecret: RFC2136TSIGSecretAlg: RFC2136TAXFR:false RFC2136MinTTL:0s RFC2136BatchChangeSize:50 NS1Endpoint: NS1IgnoreSSL:false NS1MinTTLSeconds:0 TransIPAccountName: TransIPPrivateKeyFile: DigitalOceanAPIPageSize:50 ManagedDNSRecordTypes:[A AAAA CNAME] ExcludeDNSRecordTypes:[] GoDaddyAPIKey: GoDaddySecretKey: GoDaddyTTL:0 GoDaddyOTE:false OCPRouterName: IBMCloudProxied:false IBMCloudConfigFile:/etc/kubernetes/ibmcloud.json TencentCloudConfigFile:/etc/kubernetes/tencent-cloud.json TencentCloudZoneType: PiholeServer: PiholePassword: PiholeTLSInsecureSkipVerify:false PluralCluster: PluralProvider: WebhookProviderURL:http://localhost:8888 WebhookProviderReadTimeout:5s WebhookProviderWriteTimeout:10s WebhookServer:false}"
time="2024-01-05T09:31:50Z" level=info msg="Instantiating new Kubernetes client"
time="2024-01-05T09:31:50Z" level=debug msg="apiServerURL: "
time="2024-01-05T09:31:50Z" level=debug msg="kubeConfig: "
time="2024-01-05T09:31:50Z" level=info msg="Using inCluster-config based on serviceaccount-token"
time="2024-01-05T09:31:50Z" level=info msg="Created Kubernetes client https://10.208.0.1:443"
time="2024-01-05T09:31:51Z" level=info msg="Using managed identity extension to retrieve access token for Azure API."
time="2024-01-05T09:31:51Z" level=debug msg="Retrieving Azure Private DNS zones for Resource Group 'plc-private-dns'"
time="2024-01-05T09:33:35Z" level=fatal msg="ManagedIdentityCredential authentication failed\nGET http://169.254.169.254/metadata/identity/oauth2/token\n--------------------------------------------------------------------------------\nRESPONSE 404 Not Found\n--------------------------------------------------------------------------------\nno azure identity found for request clientID <mine>\n\n--------------------------------------------------------------------------------\nTo troubleshoot, visit https://aka.ms/azsdk/go/identity/troubleshoot#managed-id"

khauser avatar Jan 05 '24 09:01 khauser

I tried it now with an own Service Principal and also with a workload identity. Both are working... except the kubelet identity.

khauser avatar Jan 11 '24 15:01 khauser

I am also seeing this issue.

PixelRobots avatar Mar 26 '24 14:03 PixelRobots

same issue

ghost avatar Apr 01 '24 19:04 ghost

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jun 30 '24 19:06 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jul 30 '24 20:07 k8s-triage-robot

/remove-lifecycle rotten

khauser avatar Jul 31 '24 09:07 khauser

any update on this?

Chandra2614 avatar Aug 20 '24 07:08 Chandra2614

I tried it now with an own Service Principal and also with a workload identity. Both are working... except the kubelet identity.

Mine is not working with workload identity, I am passing the client id of user assigned identity(which has federated credential) while running from Azure devops self hosted runners

Chandra2614 avatar Aug 20 '24 09:08 Chandra2614

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 18 '24 09:11 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Dec 18 '24 10:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Jan 17 '25 10:01 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Jan 17 '25 10:01 k8s-ci-robot