terraform-provider-kubernetes icon indicating copy to clipboard operation
terraform-provider-kubernetes copied to clipboard

Authenticate `kubernetes` provider to AKS using Terraform Cloud Dynamic Credentials for Azure

Open jeffhuenemann opened this issue 1 year ago • 9 comments

Description

One-liner:

  • I want the kubernetes Terraform provider to work with Entra-enabled AKS without managing any secrets (just OIDC federations).

Scenario:

  • Have AKS cluster, pre-created in separate Terraform codebase/run, with managed Entra ID integration.
  • Creating a Terraform module that utilizes both azurerm and kubernetes providers, for onboarding new apps/apis into AKS cluster. (azurerm_user_assigned_identity, kubernetes_namespace_v1, kubernetes_service_account_v1, etc.)
  • Using Terraform Cloud with a workspace that is configured with Dynamic Credentials for Azure, and it authenticates the azurerm provider perfectly
  • The Azure identity being targeted for dynamic credentials holds:
    • Owner role of the resource group where the azurerm resources go
    • the Azure Kubernetes Service RBAC Cluster Admin role, sufficient to make any changes through the Kubernetes API of the AKS cluster

Manual version illustrating a similar idea:

# Login to Azure
az login # use whatever details/parameters for your environment

# Convert kubeconfig to inherit the Azure CLI credential you've already established
# This switches kubeconfig to use an `exec` to `kubelogin`
kubelogin convert-kubeconfig -l azurecli

# Now, do stuff with kubectl
kubectl get nodes -o wide

# Each call of kubectl runs `kubelogin get-token` to get a short-lived credential, inheriting the identity already captured for Azure

Goal:

  • The kubernetes Terraform provider is able to take on the same identity being pulled in by the azurerm provider, using that identity to call the AKS cluster's Kubernetes API when provisioning kubernetes_* resources
  • have zero secrets to store/rotate/protect (as is accomplished by the azurerm provider federating via OIDC)

Potential Terraform Configuration

I can imagine two ways to do this:

Option 1: kubernetes provider can be told to use the same Azure Dyamic Credentials as the azurerm provider

terraform {
  cloud {
    organization = "my-org"
    workspaces {
      name = "this-workspace" # this workspace is set up for dyamic azure credentials
    }
  }

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "3.113.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "2.31.0"
    }
  }
}

provider "azurerm" {
  features {
    # Empty, we don't need anything special, but this block has to be here
  }

  # the main provider configuration comes from the following environment variables being set in the TFC workspace, per
  # https://developer.hashicorp.com/terraform/cloud-docs/workspaces/dynamic-provider-credentials/azure-configuration#configure-the-azurerm-or-microsoft-entra-id-provider
  # 
  # ARM_TENANT_ID = <our tenant id>
  # ARM_SUBSCRIPTION_ID = <our subscription id>
  # TFC_AZURE_PROVIDER_AUTH = true
  # TFC_AZURE_RUN_CLIENT_ID = <the client id of our pipeline credential that is configured to accept oidc>
}

data "azurerm_kubernetes_cluster" "aks" {
  resource_group_name = local.cluster_resource_group_name
  name                = local.cluster_name
}

provider "kubernetes" {
  host                              = data.azurerm_kubernetes_cluster.aks.kube_config.0.host
  cluster_ca_certificate            = base64decode(data.azurerm_kubernetes_cluster.aks.kube_config.0.cluster_ca_certificate)
  use_tfc_azure_dynamic_credentials = true # <== this is the thing that would have to be invented, maybe borrowing code from `azurerm` provider
}

# Off in a module somewhere:
# This resource is provisioned by the `kubernetes` provider, but using the Azure dynamic credential
resource "kubernetes_namespace_v1" "ns" {
  metadata {
    name = local.kubernetes_namespace_name
    labels = {
      # ...
    }
  }
}

Option 2: kubernetes provider exchanges the TFC-provided OIDC token on its own:

terraform {
  cloud {
    organization = "my-org"
    workspaces {
      name = "this-workspace" # this workspace is set up for dyamic azure credentials
    }
  }

  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "3.113.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "2.31.0"
    }
  }
}

provider "azurerm" {
  features {
    # Empty, we don't need anything special, but this block has to be here
  }

  # the main provider configuration comes from the following environment variables being set in the TFC workspace, per
  # https://developer.hashicorp.com/terraform/cloud-docs/workspaces/dynamic-provider-credentials/azure-configuration#configure-the-azurerm-or-microsoft-entra-id-provider
  # 
  # ARM_TENANT_ID = <our tenant id>
  # ARM_SUBSCRIPTION_ID = <our subscription id>
  # TFC_AZURE_PROVIDER_AUTH = true
  # TFC_AZURE_RUN_CLIENT_ID = <the client id of our pipeline credential that is configured to accept oidc>
}

data "azurerm_kubernetes_cluster" "aks" {
  resource_group_name = local.cluster_resource_group_name
  name                = local.cluster_name
}

# https://developer.hashicorp.com/terraform/cloud-docs/workspaces/dynamic-provider-credentials/azure-configuration#required-terraform-variable
# This "magic" variable is populated by the TFC workspace at runtime,
# And is especially required if you have multiple instances of the `azurerm` provider with aliases
variable "tfc_azure_dynamic_credentials" {
  description = "Object containing Azure dynamic credentials configuration"
  type = object({
    default = object({
      client_id_file_path  = string
      oidc_token_file_path = string
    })
    aliases = map(object({
      client_id_file_path  = string
      oidc_token_file_path = string
    }))
  })
}

provider "kubernetes" {
  host                              = data.azurerm_kubernetes_cluster.aks.kube_config.0.host
  cluster_ca_certificate            = base64decode(data.azurerm_kubernetes_cluster.aks.kube_config.0.cluster_ca_certificate)
  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "kubelogin"
    args = [
      "get-token",
      "--environment", "AzurePublicCloud",
      "--server-id", "6dae42f8-4368-4678-94ff-3960e28e3630", # Always the same, https://azure.github.io/kubelogin/concepts/aks.html
      "--client-id", "80faf920-1908-4b52-b5ef-a8e7bedfc67a", # Always the same, https://azure.github.io/kubelogin/concepts/aks.html
      "--tenant-id", data.azurerm_kubernetes_cluster.aks.azure_active_directory_role_based_access_control.0.tenant_id,
      "--authority-host", "https://login.microsoftonline.com/${data.azurerm_kubernetes_cluster.aks.azure_active_directory_role_based_access_control.0.tenant_id}", # or something similar, if it would work
      "--login", "workloadidentity",
      "--federated-token-file", var.tfc_azure_dynamic_credentials.default.oidc_token_file_path
    ]
  }
}

# Off in a module somewhere:
# This resource is provisioned by the `kubernetes` provider, but using the Azure dynamic credential
resource "kubernetes_namespace_v1" "ns" {
  metadata {
    name = local.kubernetes_namespace_name
    labels = {
      # ...
    }
  }
}

Notes:

  • 📝 This option requires kubelogin to be available within the context of the Terraform run. We need a self-hosted TFC agent anyways, due to use of a private cluster, so the TFC-provided agents wouldn't have line-of-sight to the Kubernetes API, and have installed kubelogin ourselves.
  • When the Azure Dynamic Credentials are set up, TFC places a valid JWT at the path: /home/tfc-agent/.tfc-agent/component/terraform/runs/{run-id-here}/tfc-azure-token, with issuer of https://app.terraform.io and audience of api://AzureADTokenExchange, but using that JWT with kubelogin isn't working
  • If I manually do kubelogin get-token command as specified in my kubeconfig after kubelogin convert-kubeconfig -l azurecli, I get a JWT with an issuer of https://sts.windows.net/{my-tenant-id-here}/ and audience of 6dae42f8-4368-4678-94ff-3960e28e3630, which is that static Entra ID for the AKS OIDC application that is the same for every customer. I believe this JWT is what is being submitted with calls to the Kubernetes API.

References

  • relates #2072
  • relates hashicorp/terraform-provider-helm#1114

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

jeffhuenemann avatar Oct 17 '24 16:10 jeffhuenemann

Hi,

I just want to make sure that I understand what exactly is the ask here. So would using a configuration like you presented in Option 2, which would work, not exactly meet your expectations?

alexsomesan avatar Oct 28 '24 13:10 alexsomesan

@alexsomesan Thanks for the reply - Option 2 would totally work (if that worked) to meet the need, and at that point, I would propose that this issue would be solved with just a documentation update that showed a workable solution using the exec { } method. As-is, at least when I tried it, I couldn't get the provider to authenticate using the token that TFC presents in that location.

Option 1 would also meet the need, and would rhyme more with how the azurerm and azuread providers are able to infer their authentication using just a couple of environment variables that feed the Azure SDK in the provider.

Either way it were to be accomplished, the ultimate goal is that the kubernetes provider (like azurerm and azuread) could authenticate into an AKS cluster using TFC dynamic Azure credentials.

jeffhuenemann avatar Oct 30 '24 19:10 jeffhuenemann

I have a somewhat hacky workaround for this (in my case, using a GitHub Actions ID token as the credential):

locals {
  github_id_token_azure_filename = "/tmp/.github-id-token-azure"
}

data "azurerm_client_config" "current" {
}

data "azurerm_kubernetes_cluster" "aks" {
  name                = var.aks_name
  resource_group_name = var.aks_resource_group
}

data "http" "github_id_token_azure" {
  url = "${var.github_id_token_request_url}&audience=api://AzureADTokenExchange"
  request_headers = {
    Authorization = "bearer ${var.github_id_token_request_token}"
    Accept        = "application/json; api-version=2.0"
  }
}

locals {
  kubernetes_credentials = {
    host = one(data.azurerm_kubernetes_cluster.aks.kube_config).host
    cluster_ca_certificate = base64decode(
      one(data.azurerm_kubernetes_cluster.aks.kube_config)
      .cluster_ca_certificate
    )
    exec_api_version = "client.authentication.k8s.io/v1beta1"
    exec_command     = "/bin/bash"
    exec_args = [
      "-c",
      join(" ", [
        "echo \"${jsondecode(data.http.github_id_token_azure.response_body).value}\"",
        "> ${local.github_id_token_azure_filename}",
        "&&",
        "kubelogin",
        "get-token",
        "--login",
        "workloadidentity",
        "--server-id",
        "6dae42f8-4368-4678-94ff-3960e28e3630", # See https://azure.github.io/kubelogin/concepts/aks.html
      ])
    ]
    exec_env = {
      AZURE_AUTHORITY_HOST       = "https://login.microsoftonline.com/"
      AZURE_TENANT_ID            = data.azurerm_client_config.current.tenant_id
      AZURE_CLIENT_ID            = data.azurerm_client_config.current.client_id
      AZURE_FEDERATED_TOKEN_FILE = local.github_id_token_azure_filename
    }
  }
}

provider "kubernetes" {
  host                   = local.kubernetes_credentials.host
  cluster_ca_certificate = local.kubernetes_credentials.cluster_ca_certificate
  exec {
    api_version = local.kubernetes_credentials.exec_api_version
    command     = local.kubernetes_credentials.exec_command
    args        = local.kubernetes_credentials.exec_args
    env         = local.kubernetes_credentials.exec_env
  }
}

These two Terraform variables are populated by the environment variables (I do this in Terragrunt):

  github_id_token_request_url   = get_env("ACTIONS_ID_TOKEN_REQUEST_URL")
  github_id_token_request_token = get_env("ACTIONS_ID_TOKEN_REQUEST_TOKEN")

The reason it's as ugly as it is is that kubelogin get-token requires the federated ID token to be written to a file beforehand, so we need a way to force Terraform to write that file every time it starts the provider - including during the plan phase.

I can't think of a way of doing this that doesn't involve a shell command - feel free to propose better ways if you can think of them!

It would be great if there was a cleaner way to do this. I suppose the question is - do we expect the kubernetes provider to provide a wrapper for this logic for all major cloud platforms, or should this functionality be implemented upstream by the cloud platform's client-go plugin?

For instance, if kubelogin had --federated-token-request-url and --federated-token-request-token as options, that would make this a LOT cleaner - or even better, just --federated-token-provider github.

I can't find any existing issues suggesting this - want me to create one?

jtv8 avatar Nov 26 '24 13:11 jtv8

Rereading your issue, it's weird that there's already a valid JWT with the correct audience - which is the hard part - but it isn't working with kubelogin. That warrants some investigation.

I've had issues before if I use an ID token that was issued before the AKS cluster was created. I have a theory that there's some logic somewhere that checks that the iat claim doesn't pre-date the creation timestamp of the cluster or the managed identity. Could it be that?

jtv8 avatar Nov 26 '24 13:11 jtv8

I don't know if this is related, but for me it was straight forward. I have seen many (for me) complex setups like https://github.com/neumanndaniel/terraform/blob/master/modules/kubelogin/main.tf or the proposed options here.

In my case, I only had to

# Login to Azure
az login
# Get AKS Creds
az aks get-credentials --resource-group "rg-XYZ" --name "aks-XYZ" --overwrite-existing
# Convert kubeconfig
kubelogin convert-kubeconfig -l azurecli

In Terraform I only had to define the following provider config

provider "kubernetes" {
  config_path    = "~/.kube/config"
  config_context = "aks-XYZ"
}

and I was able to apply this example

resource "kubernetes_namespace" "example" {
  metadata {
    name = "my-first-namespace"
  }
}

Hope this might help others, at least for local deployments. But I expect that if you convert the kubeconfig via kubelogin convert-kubeconfig -lto other methods (https://azure.github.io/kubelogin/cli/convert-kubeconfig.html), Terraform should be able to use them as well.

choeffer avatar Dec 10 '24 16:12 choeffer

@jtv8 Thank you very much for your workaround, it's working for me when using the helm provider which is using nearly the same syntax.

I adjusted it a bit to ensure the token file is being removed after authentication and to read the environment variables from within the terraform run.

data "external" "env_github_id_token_request" {
  program = ["/bin/bash", "-c", "echo {\\\"url\\\":\\\"$ACTIONS_ID_TOKEN_REQUEST_URL\\\", \\\"token\\\":\\\"$ACTIONS_ID_TOKEN_REQUEST_TOKEN\\\"}"]
}

data "http" "github_id_token_azure" {
  url = "${data.external.env_github_id_token_request.result["url"]}&audience=api://AzureADTokenExchange"
  request_headers = {
    Authorization = "bearer ${data.external.env_github_id_token_request.result["token"]}"
    Accept        = "application/json; api-version=2.0"
  }
}

provider "helm" {
  kubernetes = {
    host                   = one(data.azurerm_kubernetes_cluster.aks.kube_config).host
    cluster_ca_certificate = base64decode(one(data.azurerm_kubernetes_cluster.aks.kube_config).cluster_ca_certificate)

    exec = {
      api_version = "client.authentication.k8s.io/v1beta1"
      command     = "/bin/bash"
      args = [
        "-c",
        <<EOT
        tempfile=$(mktemp)
        export AZURE_FEDERATED_TOKEN_FILE="$tempfile"
        echo "${jsondecode(data.http.github_id_token_azure.response_body).value}" > $tempfile
        kubelogin get-token \
          --login workloadidentity \
          --server-id 6dae42f8-4368-4678-94ff-3960e28e3630 # See https://azure.github.io/kubelogin/concepts/aks.html
        rm $tempfile
        EOT
      ]
      env = {
        AZURE_AUTHORITY_HOST = "https://login.microsoftonline.com/"
        AZURE_TENANT_ID      = data.azurerm_client_config.current.tenant_id
        AZURE_CLIENT_ID      = data.azurerm_client_config.current.client_id
      }
    }
  }
}

CoolDuke avatar Mar 19 '25 11:03 CoolDuke

Adjusted @CoolDuke's workaround, and this does the job in TFC (without kubelogin binary)

provider "helm" {
  kubernetes = {
    host                   = one(data.azurerm_kubernetes_cluster.aks.kube_config).host
    cluster_ca_certificate = base64decode(one(data.azurerm_kubernetes_cluster.aks.kube_config).cluster_ca_certificate)

    exec = {
      api_version = "client.authentication.k8s.io/v1"
      command     = "sh"
      args = [
        "-c",
        <<-EOT
          token_response=$(curl -sS -X POST \
            -d "client_id=$ARM_CLIENT_ID" \
            -d "client_assertion_type=urn:ietf:params:oauth:client-assertion-type:jwt-bearer" \
            -d "client_assertion=$TFC_WORKLOAD_IDENTITY_TOKEN" \
            -d "grant_type=client_credentials" \
            -d "scope=6dae42f8-4368-4678-94ff-3960e28e3630/.default" \
            "https://login.microsoftonline.com/$ARM_TENANT_ID/oauth2/v2.0/token")
          expires_in=$(echo "$token_response" | jq -r '.expires_in')
          expiry_date=$(date -u -d "now + $expires_in seconds" '+%Y-%m-%dT%H:%M:%SZ')
          echo "$token_response" \
          | jq -r --arg server "${one(data.azurerm_kubernetes_cluster.aks.kube_config).host}" --arg expiry_date "$expiry_date" '
              {
                apiVersion: "client.authentication.k8s.io/v1",
                kind: "ExecCredential",
                spec: {
                  interactive: false,
                  cluster: {
                    server: $server,
                    insecureSkipTlsVerify: false
                  }
                },
                status: {
                  token: .access_token,
                  expirationTimestamp: $expiry_date
                }
              }
            '
        EOT
      ]
    }
  }
}

this ensures that token result will be not stored in state and follows Client Authentication (v1)/(v1beta1) recommendations:

Image

[!IMPORTANT]
Make sure to set ARM_CLIENT_ID and ARM_TENANT_ID as TFC variables.

[!TIP] When setting federated identity credentials (FIC) for TFC, remember that plan and run will generate different subject claims during OIDC so you have to configure separate FIC for each scenario

sbx0r avatar Apr 11 '25 10:04 sbx0r

After too much trial and error, I adapted @sbx0r's script to work with dynamic azure credentials for use with AKS. Also contains error handling via exit codes because these exec plugins are seemingly-impossible to debug in terraform.

provider "kubernetes" {
  host                   = data.azurerm_kubernetes_cluster.aks_cluster.kube_config[0].host
  cluster_ca_certificate = base64decode(data.azurerm_kubernetes_cluster.aks_cluster.kube_config[0].cluster_ca_certificate)

  exec {
    api_version = "client.authentication.k8s.io/v1"
    command     = "sh"
    args = [
      "-c",
      <<-EOT
        # Exit codes:
        # 1: General error
        # 10: Missing or invalid client ID file
        # 11: Missing or invalid OIDC token file
        # 20: Curl request failed
        # 21: Invalid JSON response
        # 22: Token missing from response
        # 30: Date calculation failed
        # 31: JQ processing failed

        set -e  # Exit on any error

        # Validate required files exist
        if [ ! -f "${var.tfc_azure_dynamic_credentials.default.client_id_file_path}" ]; then
            echo "Error: Client ID file not found" >&2
            exit 10
        fi

        if [ ! -f "${var.tfc_azure_dynamic_credentials.default.oidc_token_file_path}" ]; then
            echo "Error: OIDC token file not found" >&2
            exit 11
        fi

        # Get the client ID from the TFC dynamic credentials file
        client_id=$(cat "${var.tfc_azure_dynamic_credentials.default.client_id_file_path}") || exit 10
        tenant_id="${data.azurerm_subscription.current.tenant_id}"

        # Perform the OAuth 2.0 token exchange with error handling
        token_response=$(curl -sS -f -X POST \
          -d "client_id=$client_id" \
          -d "client_assertion_type=urn:ietf:params:oauth:client-assertion-type:jwt-bearer" \
          -d "client_assertion=$(cat "${var.tfc_azure_dynamic_credentials.default.oidc_token_file_path}")" \
          -d "grant_type=client_credentials" \
          -d "scope=6dae42f8-4368-4678-94ff-3960e28e3630/.default" \
          "https://login.microsoftonline.com/$tenant_id/oauth2/v2.0/token" 2>&1) || {
            echo "Failed to obtain token: $token_response" >&2
            exit 20
        }

        # Validate JSON response
        if ! echo "$token_response" | jq . >/dev/null 2>&1; then
            echo "Invalid JSON response: $token_response" >&2
            exit 21
        fi

        # Check if access_token exists in response
        if [ "$(echo "$token_response" | jq -r '.access_token')" = "null" ]; then
            echo "No access token in response" >&2
            exit 22
        fi

        # Calculate expiry time for the token
        expires_in=$(echo "$token_response" | jq -r '.expires_in') || {
            echo "Failed to extract expires_in from response" >&2
            exit 31
        }

        expiry_date=$(date -u -d "now + $expires_in seconds" '+%Y-%m-%dT%H:%M:%SZ') || {
            echo "Failed to calculate expiry date" >&2
            exit 30
        }

        # Format the response as a Kubernetes ExecCredential
        echo "$token_response" \
        | jq -r --arg server "${data.azurerm_kubernetes_cluster.aks_cluster.kube_config[0].host}" --arg expiry_date "$expiry_date" '
            {
              apiVersion: "client.authentication.k8s.io/v1",
              kind: "ExecCredential",
              spec: {
                interactive: false,
                cluster: {
                  server: $server,
                  insecureSkipTlsVerify: false
                }
              },
              status: {
                token: .access_token,
                expirationTimestamp: $expiry_date
              }
            }
          ' || {
            echo "Failed to generate ExecCredential JSON" >&2
            exit 31
          }
      EOT
    ]
  }
}

starcraft66 avatar May 02 '25 17:05 starcraft66

Let's call it a day 😀

sbx0r avatar May 02 '25 17:05 sbx0r