terraform-provider-kubernetes
terraform-provider-kubernetes copied to clipboard
Authenticate `kubernetes` provider to AKS using Terraform Cloud Dynamic Credentials for Azure
Description
One-liner:
- I want the
kubernetesTerraform provider to work with Entra-enabled AKS without managing any secrets (just OIDC federations).
Scenario:
- Have AKS cluster, pre-created in separate Terraform codebase/run, with managed Entra ID integration.
- Creating a Terraform module that utilizes both
azurermandkubernetesproviders, for onboarding new apps/apis into AKS cluster. (azurerm_user_assigned_identity,kubernetes_namespace_v1,kubernetes_service_account_v1, etc.) - Using Terraform Cloud with a workspace that is configured with Dynamic Credentials for Azure, and it authenticates the
azurermprovider perfectly - The Azure identity being targeted for dynamic credentials holds:
Ownerrole of the resource group where theazurermresources go- the
Azure Kubernetes Service RBAC Cluster Adminrole, sufficient to make any changes through the Kubernetes API of the AKS cluster
Manual version illustrating a similar idea:
# Login to Azure
az login # use whatever details/parameters for your environment
# Convert kubeconfig to inherit the Azure CLI credential you've already established
# This switches kubeconfig to use an `exec` to `kubelogin`
kubelogin convert-kubeconfig -l azurecli
# Now, do stuff with kubectl
kubectl get nodes -o wide
# Each call of kubectl runs `kubelogin get-token` to get a short-lived credential, inheriting the identity already captured for Azure
Goal:
- The
kubernetesTerraform provider is able to take on the same identity being pulled in by theazurermprovider, using that identity to call the AKS cluster's Kubernetes API when provisioningkubernetes_*resources - have zero secrets to store/rotate/protect (as is accomplished by the
azurermprovider federating via OIDC)
Potential Terraform Configuration
I can imagine two ways to do this:
Option 1: kubernetes provider can be told to use the same Azure Dyamic Credentials as the azurerm provider
terraform {
cloud {
organization = "my-org"
workspaces {
name = "this-workspace" # this workspace is set up for dyamic azure credentials
}
}
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "3.113.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "2.31.0"
}
}
}
provider "azurerm" {
features {
# Empty, we don't need anything special, but this block has to be here
}
# the main provider configuration comes from the following environment variables being set in the TFC workspace, per
# https://developer.hashicorp.com/terraform/cloud-docs/workspaces/dynamic-provider-credentials/azure-configuration#configure-the-azurerm-or-microsoft-entra-id-provider
#
# ARM_TENANT_ID = <our tenant id>
# ARM_SUBSCRIPTION_ID = <our subscription id>
# TFC_AZURE_PROVIDER_AUTH = true
# TFC_AZURE_RUN_CLIENT_ID = <the client id of our pipeline credential that is configured to accept oidc>
}
data "azurerm_kubernetes_cluster" "aks" {
resource_group_name = local.cluster_resource_group_name
name = local.cluster_name
}
provider "kubernetes" {
host = data.azurerm_kubernetes_cluster.aks.kube_config.0.host
cluster_ca_certificate = base64decode(data.azurerm_kubernetes_cluster.aks.kube_config.0.cluster_ca_certificate)
use_tfc_azure_dynamic_credentials = true # <== this is the thing that would have to be invented, maybe borrowing code from `azurerm` provider
}
# Off in a module somewhere:
# This resource is provisioned by the `kubernetes` provider, but using the Azure dynamic credential
resource "kubernetes_namespace_v1" "ns" {
metadata {
name = local.kubernetes_namespace_name
labels = {
# ...
}
}
}
Option 2: kubernetes provider exchanges the TFC-provided OIDC token on its own:
terraform {
cloud {
organization = "my-org"
workspaces {
name = "this-workspace" # this workspace is set up for dyamic azure credentials
}
}
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "3.113.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "2.31.0"
}
}
}
provider "azurerm" {
features {
# Empty, we don't need anything special, but this block has to be here
}
# the main provider configuration comes from the following environment variables being set in the TFC workspace, per
# https://developer.hashicorp.com/terraform/cloud-docs/workspaces/dynamic-provider-credentials/azure-configuration#configure-the-azurerm-or-microsoft-entra-id-provider
#
# ARM_TENANT_ID = <our tenant id>
# ARM_SUBSCRIPTION_ID = <our subscription id>
# TFC_AZURE_PROVIDER_AUTH = true
# TFC_AZURE_RUN_CLIENT_ID = <the client id of our pipeline credential that is configured to accept oidc>
}
data "azurerm_kubernetes_cluster" "aks" {
resource_group_name = local.cluster_resource_group_name
name = local.cluster_name
}
# https://developer.hashicorp.com/terraform/cloud-docs/workspaces/dynamic-provider-credentials/azure-configuration#required-terraform-variable
# This "magic" variable is populated by the TFC workspace at runtime,
# And is especially required if you have multiple instances of the `azurerm` provider with aliases
variable "tfc_azure_dynamic_credentials" {
description = "Object containing Azure dynamic credentials configuration"
type = object({
default = object({
client_id_file_path = string
oidc_token_file_path = string
})
aliases = map(object({
client_id_file_path = string
oidc_token_file_path = string
}))
})
}
provider "kubernetes" {
host = data.azurerm_kubernetes_cluster.aks.kube_config.0.host
cluster_ca_certificate = base64decode(data.azurerm_kubernetes_cluster.aks.kube_config.0.cluster_ca_certificate)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "kubelogin"
args = [
"get-token",
"--environment", "AzurePublicCloud",
"--server-id", "6dae42f8-4368-4678-94ff-3960e28e3630", # Always the same, https://azure.github.io/kubelogin/concepts/aks.html
"--client-id", "80faf920-1908-4b52-b5ef-a8e7bedfc67a", # Always the same, https://azure.github.io/kubelogin/concepts/aks.html
"--tenant-id", data.azurerm_kubernetes_cluster.aks.azure_active_directory_role_based_access_control.0.tenant_id,
"--authority-host", "https://login.microsoftonline.com/${data.azurerm_kubernetes_cluster.aks.azure_active_directory_role_based_access_control.0.tenant_id}", # or something similar, if it would work
"--login", "workloadidentity",
"--federated-token-file", var.tfc_azure_dynamic_credentials.default.oidc_token_file_path
]
}
}
# Off in a module somewhere:
# This resource is provisioned by the `kubernetes` provider, but using the Azure dynamic credential
resource "kubernetes_namespace_v1" "ns" {
metadata {
name = local.kubernetes_namespace_name
labels = {
# ...
}
}
}
Notes:
- 📝 This option requires
kubeloginto be available within the context of the Terraform run. We need a self-hosted TFC agent anyways, due to use of a private cluster, so the TFC-provided agents wouldn't have line-of-sight to the Kubernetes API, and have installedkubeloginourselves. - When the Azure Dynamic Credentials are set up, TFC places a valid JWT at the path:
/home/tfc-agent/.tfc-agent/component/terraform/runs/{run-id-here}/tfc-azure-token, with issuer ofhttps://app.terraform.ioand audience ofapi://AzureADTokenExchange, but using that JWT withkubeloginisn't working - If I manually do
kubelogin get-tokencommand as specified in mykubeconfigafterkubelogin convert-kubeconfig -l azurecli, I get a JWT with an issuer ofhttps://sts.windows.net/{my-tenant-id-here}/and audience of6dae42f8-4368-4678-94ff-3960e28e3630, which is that static Entra ID for the AKS OIDC application that is the same for every customer. I believe this JWT is what is being submitted with calls to the Kubernetes API.
References
- relates #2072
- relates hashicorp/terraform-provider-helm#1114
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Hi,
I just want to make sure that I understand what exactly is the ask here. So would using a configuration like you presented in Option 2, which would work, not exactly meet your expectations?
@alexsomesan Thanks for the reply - Option 2 would totally work (if that worked) to meet the need, and at that point, I would propose that this issue would be solved with just a documentation update that showed a workable solution using the exec { } method. As-is, at least when I tried it, I couldn't get the provider to authenticate using the token that TFC presents in that location.
Option 1 would also meet the need, and would rhyme more with how the azurerm and azuread providers are able to infer their authentication using just a couple of environment variables that feed the Azure SDK in the provider.
Either way it were to be accomplished, the ultimate goal is that the kubernetes provider (like azurerm and azuread) could authenticate into an AKS cluster using TFC dynamic Azure credentials.
I have a somewhat hacky workaround for this (in my case, using a GitHub Actions ID token as the credential):
locals {
github_id_token_azure_filename = "/tmp/.github-id-token-azure"
}
data "azurerm_client_config" "current" {
}
data "azurerm_kubernetes_cluster" "aks" {
name = var.aks_name
resource_group_name = var.aks_resource_group
}
data "http" "github_id_token_azure" {
url = "${var.github_id_token_request_url}&audience=api://AzureADTokenExchange"
request_headers = {
Authorization = "bearer ${var.github_id_token_request_token}"
Accept = "application/json; api-version=2.0"
}
}
locals {
kubernetes_credentials = {
host = one(data.azurerm_kubernetes_cluster.aks.kube_config).host
cluster_ca_certificate = base64decode(
one(data.azurerm_kubernetes_cluster.aks.kube_config)
.cluster_ca_certificate
)
exec_api_version = "client.authentication.k8s.io/v1beta1"
exec_command = "/bin/bash"
exec_args = [
"-c",
join(" ", [
"echo \"${jsondecode(data.http.github_id_token_azure.response_body).value}\"",
"> ${local.github_id_token_azure_filename}",
"&&",
"kubelogin",
"get-token",
"--login",
"workloadidentity",
"--server-id",
"6dae42f8-4368-4678-94ff-3960e28e3630", # See https://azure.github.io/kubelogin/concepts/aks.html
])
]
exec_env = {
AZURE_AUTHORITY_HOST = "https://login.microsoftonline.com/"
AZURE_TENANT_ID = data.azurerm_client_config.current.tenant_id
AZURE_CLIENT_ID = data.azurerm_client_config.current.client_id
AZURE_FEDERATED_TOKEN_FILE = local.github_id_token_azure_filename
}
}
}
provider "kubernetes" {
host = local.kubernetes_credentials.host
cluster_ca_certificate = local.kubernetes_credentials.cluster_ca_certificate
exec {
api_version = local.kubernetes_credentials.exec_api_version
command = local.kubernetes_credentials.exec_command
args = local.kubernetes_credentials.exec_args
env = local.kubernetes_credentials.exec_env
}
}
These two Terraform variables are populated by the environment variables (I do this in Terragrunt):
github_id_token_request_url = get_env("ACTIONS_ID_TOKEN_REQUEST_URL")
github_id_token_request_token = get_env("ACTIONS_ID_TOKEN_REQUEST_TOKEN")
The reason it's as ugly as it is is that kubelogin get-token requires the federated ID token to be written to a file beforehand, so we need a way to force Terraform to write that file every time it starts the provider - including during the plan phase.
I can't think of a way of doing this that doesn't involve a shell command - feel free to propose better ways if you can think of them!
It would be great if there was a cleaner way to do this. I suppose the question is - do we expect the kubernetes provider to provide a wrapper for this logic for all major cloud platforms, or should this functionality be implemented upstream by the cloud platform's client-go plugin?
For instance, if kubelogin had --federated-token-request-url and --federated-token-request-token as options, that would make this a LOT cleaner - or even better, just --federated-token-provider github.
I can't find any existing issues suggesting this - want me to create one?
Rereading your issue, it's weird that there's already a valid JWT with the correct audience - which is the hard part - but it isn't working with kubelogin. That warrants some investigation.
I've had issues before if I use an ID token that was issued before the AKS cluster was created. I have a theory that there's some logic somewhere that checks that the iat claim doesn't pre-date the creation timestamp of the cluster or the managed identity. Could it be that?
I don't know if this is related, but for me it was straight forward. I have seen many (for me) complex setups like https://github.com/neumanndaniel/terraform/blob/master/modules/kubelogin/main.tf or the proposed options here.
In my case, I only had to
# Login to Azure
az login
# Get AKS Creds
az aks get-credentials --resource-group "rg-XYZ" --name "aks-XYZ" --overwrite-existing
# Convert kubeconfig
kubelogin convert-kubeconfig -l azurecli
In Terraform I only had to define the following provider config
provider "kubernetes" {
config_path = "~/.kube/config"
config_context = "aks-XYZ"
}
and I was able to apply this example
resource "kubernetes_namespace" "example" {
metadata {
name = "my-first-namespace"
}
}
Hope this might help others, at least for local deployments. But I expect that if you convert the kubeconfig via kubelogin convert-kubeconfig -lto other methods (https://azure.github.io/kubelogin/cli/convert-kubeconfig.html), Terraform should be able to use them as well.
@jtv8 Thank you very much for your workaround, it's working for me when using the helm provider which is using nearly the same syntax.
I adjusted it a bit to ensure the token file is being removed after authentication and to read the environment variables from within the terraform run.
data "external" "env_github_id_token_request" {
program = ["/bin/bash", "-c", "echo {\\\"url\\\":\\\"$ACTIONS_ID_TOKEN_REQUEST_URL\\\", \\\"token\\\":\\\"$ACTIONS_ID_TOKEN_REQUEST_TOKEN\\\"}"]
}
data "http" "github_id_token_azure" {
url = "${data.external.env_github_id_token_request.result["url"]}&audience=api://AzureADTokenExchange"
request_headers = {
Authorization = "bearer ${data.external.env_github_id_token_request.result["token"]}"
Accept = "application/json; api-version=2.0"
}
}
provider "helm" {
kubernetes = {
host = one(data.azurerm_kubernetes_cluster.aks.kube_config).host
cluster_ca_certificate = base64decode(one(data.azurerm_kubernetes_cluster.aks.kube_config).cluster_ca_certificate)
exec = {
api_version = "client.authentication.k8s.io/v1beta1"
command = "/bin/bash"
args = [
"-c",
<<EOT
tempfile=$(mktemp)
export AZURE_FEDERATED_TOKEN_FILE="$tempfile"
echo "${jsondecode(data.http.github_id_token_azure.response_body).value}" > $tempfile
kubelogin get-token \
--login workloadidentity \
--server-id 6dae42f8-4368-4678-94ff-3960e28e3630 # See https://azure.github.io/kubelogin/concepts/aks.html
rm $tempfile
EOT
]
env = {
AZURE_AUTHORITY_HOST = "https://login.microsoftonline.com/"
AZURE_TENANT_ID = data.azurerm_client_config.current.tenant_id
AZURE_CLIENT_ID = data.azurerm_client_config.current.client_id
}
}
}
}
Adjusted @CoolDuke's workaround, and this does the job in TFC (without kubelogin binary)
provider "helm" {
kubernetes = {
host = one(data.azurerm_kubernetes_cluster.aks.kube_config).host
cluster_ca_certificate = base64decode(one(data.azurerm_kubernetes_cluster.aks.kube_config).cluster_ca_certificate)
exec = {
api_version = "client.authentication.k8s.io/v1"
command = "sh"
args = [
"-c",
<<-EOT
token_response=$(curl -sS -X POST \
-d "client_id=$ARM_CLIENT_ID" \
-d "client_assertion_type=urn:ietf:params:oauth:client-assertion-type:jwt-bearer" \
-d "client_assertion=$TFC_WORKLOAD_IDENTITY_TOKEN" \
-d "grant_type=client_credentials" \
-d "scope=6dae42f8-4368-4678-94ff-3960e28e3630/.default" \
"https://login.microsoftonline.com/$ARM_TENANT_ID/oauth2/v2.0/token")
expires_in=$(echo "$token_response" | jq -r '.expires_in')
expiry_date=$(date -u -d "now + $expires_in seconds" '+%Y-%m-%dT%H:%M:%SZ')
echo "$token_response" \
| jq -r --arg server "${one(data.azurerm_kubernetes_cluster.aks.kube_config).host}" --arg expiry_date "$expiry_date" '
{
apiVersion: "client.authentication.k8s.io/v1",
kind: "ExecCredential",
spec: {
interactive: false,
cluster: {
server: $server,
insecureSkipTlsVerify: false
}
},
status: {
token: .access_token,
expirationTimestamp: $expiry_date
}
}
'
EOT
]
}
}
}
this ensures that token result will be not stored in state and follows Client Authentication (v1)/(v1beta1) recommendations:
[!IMPORTANT]
Make sure to setARM_CLIENT_IDandARM_TENANT_IDas TFC variables.
[!TIP] When setting federated identity credentials (FIC) for TFC, remember that
planandrunwill generate different subject claims during OIDC so you have to configure separate FIC for each scenario
After too much trial and error, I adapted @sbx0r's script to work with dynamic azure credentials for use with AKS. Also contains error handling via exit codes because these exec plugins are seemingly-impossible to debug in terraform.
provider "kubernetes" {
host = data.azurerm_kubernetes_cluster.aks_cluster.kube_config[0].host
cluster_ca_certificate = base64decode(data.azurerm_kubernetes_cluster.aks_cluster.kube_config[0].cluster_ca_certificate)
exec {
api_version = "client.authentication.k8s.io/v1"
command = "sh"
args = [
"-c",
<<-EOT
# Exit codes:
# 1: General error
# 10: Missing or invalid client ID file
# 11: Missing or invalid OIDC token file
# 20: Curl request failed
# 21: Invalid JSON response
# 22: Token missing from response
# 30: Date calculation failed
# 31: JQ processing failed
set -e # Exit on any error
# Validate required files exist
if [ ! -f "${var.tfc_azure_dynamic_credentials.default.client_id_file_path}" ]; then
echo "Error: Client ID file not found" >&2
exit 10
fi
if [ ! -f "${var.tfc_azure_dynamic_credentials.default.oidc_token_file_path}" ]; then
echo "Error: OIDC token file not found" >&2
exit 11
fi
# Get the client ID from the TFC dynamic credentials file
client_id=$(cat "${var.tfc_azure_dynamic_credentials.default.client_id_file_path}") || exit 10
tenant_id="${data.azurerm_subscription.current.tenant_id}"
# Perform the OAuth 2.0 token exchange with error handling
token_response=$(curl -sS -f -X POST \
-d "client_id=$client_id" \
-d "client_assertion_type=urn:ietf:params:oauth:client-assertion-type:jwt-bearer" \
-d "client_assertion=$(cat "${var.tfc_azure_dynamic_credentials.default.oidc_token_file_path}")" \
-d "grant_type=client_credentials" \
-d "scope=6dae42f8-4368-4678-94ff-3960e28e3630/.default" \
"https://login.microsoftonline.com/$tenant_id/oauth2/v2.0/token" 2>&1) || {
echo "Failed to obtain token: $token_response" >&2
exit 20
}
# Validate JSON response
if ! echo "$token_response" | jq . >/dev/null 2>&1; then
echo "Invalid JSON response: $token_response" >&2
exit 21
fi
# Check if access_token exists in response
if [ "$(echo "$token_response" | jq -r '.access_token')" = "null" ]; then
echo "No access token in response" >&2
exit 22
fi
# Calculate expiry time for the token
expires_in=$(echo "$token_response" | jq -r '.expires_in') || {
echo "Failed to extract expires_in from response" >&2
exit 31
}
expiry_date=$(date -u -d "now + $expires_in seconds" '+%Y-%m-%dT%H:%M:%SZ') || {
echo "Failed to calculate expiry date" >&2
exit 30
}
# Format the response as a Kubernetes ExecCredential
echo "$token_response" \
| jq -r --arg server "${data.azurerm_kubernetes_cluster.aks_cluster.kube_config[0].host}" --arg expiry_date "$expiry_date" '
{
apiVersion: "client.authentication.k8s.io/v1",
kind: "ExecCredential",
spec: {
interactive: false,
cluster: {
server: $server,
insecureSkipTlsVerify: false
}
},
status: {
token: .access_token,
expirationTimestamp: $expiry_date
}
}
' || {
echo "Failed to generate ExecCredential JSON" >&2
exit 31
}
EOT
]
}
}
Let's call it a day 😀