terraform-provider-kubernetes
terraform-provider-kubernetes copied to clipboard
Unable to use kubernetes provider with fixed limited permissions - see here: https://github.com/hashicorp/terraform-provider-azurerm/pull/21229
Terraform Version, Provider Version and Kubernetes Version
Terraform version: 1.4.4
Kubernetes provider version: 2.19.0
Kubernetes version: 1.24.9
Azurerm provicer: 3.51.0
Affected Resource(s)
Terraform Configuration Files
# Configure the Microsoft Azure Provider
provider "kubernetes" {
host = data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.host
username = data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.username
password = data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.password
client_certificate = base64decode(data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.client_certificate)
client_key = base64decode(data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.client_key)
cluster_ca_certificate = base64decode(data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.cluster_ca_certificate)
}
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = ">= 3.51.0"
}
azuread = {
source = "hashicorp/azuread"
version = ">= 2.36.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = ">= 2.19.0"
}
helm = {
source = "hashicorp/helm"
version = "2.9.0"
}
}
required_version = ">= 0.14.9"
backend "azurerm" {
}
}
data "azurerm_kubernetes_cluster" "aks_provider_config" {
name = var.env_config[var.ENV][ "aks_cluster_name" ]
resource_group_name = var.env_config[var.ENV][ "aks_rg_name" ]
}
data "kubernetes_namespace_v1" "proj_ns" {
metadata {
name = local.proj_name
}
}
Debug Output
Planning failed. Terraform encountered an error while generating this plan.
╷
│ Error: Unauthorized
│
│ with data.kubernetes_namespace_v1.proj_ns,
│ on var-proj.tf line 37, in data "kubernetes_namespace_v1" "proj_ns":
│ 37: data "kubernetes_namespace_v1" "proj_ns" {
│
╵
Steps to Reproduce
See here: https://github.com/hashicorp/terraform-provider-azurerm/issues/21183
- Create AKS Cluster with Azure AD auth with RBAC and local accounts enabled
- Create a service principal
- Assign the principal Azure Kubernetes Service Cluster User Role to allow fetch the limited permission kubeconfig
- Assign the principal Azure Kubernetes Service RBAC Admin Role to a specific namespace
- Authenticate terraform with the specific service principal and configure the k8s provider
- Try to fetch the data.kubernetes_namespace_v1
Expected Behavior
Kubernetes resources should be able to be fetched via data and resources should be created according to the limited permissions in the specific namespace
Actual Behavior
Terraform returns with error "unauthorized"
Important Factoids
I did some testing and the outcome is, that the fetch of the limited permissions works now:
2023-04-11T13:21:45.761Z [DEBUG] provider.terraform-provider-azurerm_v3.51.0_x5: AzureRM Request:
POST /subscriptions/XXX/resourceGroups/XXX/providers/Microsoft.ContainerService/managedClusters/XXX/listClusterUserCredential?api-version=2023-02-02-preview HTTP/1.1
Host: management.azure.com
User-Agent: Go/go1.19.3 (amd64-linux) go-autorest/v14.2.1 hashicorp/go-azure-sdk/managedclusters/2023-02-02-preview HashiCorp Terraform/1.4.4 (+https://www.terraform.io) Terraform Plugin SDK/2.10.1 terraform-provider-azurerm/dev VSTS_2c406b0a-3caf-4961-98e2-e310b237dd52_build_241_0 pid-222c6c49-1b0a-5959-a213-6608f9eb8820
Content-Length: 0
Content-Type: application/json; charset=utf-8
X-Ms-Correlation-Request-Id: 16ef209b-5c71-1f06-9efe-412a949223cd
Accept-Encoding: gzip: timestamp=2023-04-11T13:21:45.761Z
2023-04-11T13:21:45.949Z [DEBUG] provider.terraform-provider-azurerm_v3.51.0_x5: AzureRM Response for https://management.azure.com/subscriptions/XXX/resourceGroups/XXX/providers/Microsoft.ContainerService/managedClusters/XXX/listClusterUserCredential?api-version=2023-02-02-preview:
HTTP/2.0 200 OK
Cache-Control: no-cache
Content-Type: application/json
Date: Tue, 11 Apr 2023 13:21:44 GMT
Expires: -1
Pragma: no-cache
Server: nginx
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Accept-Encoding
X-Content-Type-Options: nosniff
X-Ms-Correlation-Request-Id: 16ef209b-5c71-1f06-9efe-412a949223cd
X-Ms-Ratelimit-Remaining-Subscription-Writes: 1198
X-Ms-Request-Id: 09f91280-9e21-4b60-bb28-1871c7e4a1d2
X-Ms-Routing-Request-Id: WESTEUROPE:20230411T132145Z:c88433b2-eb1d-4cf0-a40d-9f4c3d36dbd7
{
"kubeconfigs": [
{
"name": "clusterUser",
"value": "XXX"
}
]
}: timestamp=2023-04-11T13:21:45.949Z
The base64 decoded kubeconf (the value) looks correct:
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: XXX
server: https://XXX.azmk8s.io:443
name: XXX
contexts:
- context:
cluster: XXX
user: clusterUser_XXX
name: XXX
current-context: XXX
kind: Config
preferences: {}
users:
- name: clusterUser_XXX
user:
exec:
apiVersion: client.authentication.k8s.io/v1beta1
args:
- get-token
- --environment
- AzurePublicCloud
- --server-id
- XXX
- --client-id
- XXX
- --tenant-id
- XXX
- --login
- devicecode
command: kubelogin
env: null
provideClusterInfo: false
To debug, I tried to use this service principal sequence for kubectl. I use the following sequence:
az login --service-principal -u XXX -p XXX --tenant XXX
(This command fetches the identical kubeconfig as the terraform sequence)
az aks get-credentials --name XXX --resource-group XXX --overwrite-existing
(However, when I try to use `kubectl get all -n proj_ns` directly, i get the following:
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code XXX to authenticate.
It only works after I use kubelogin)
kubelogin convert-kubeconfig -l azurecli
After the kubelogin convert, the kubeconfig under .kube/config looks like this:
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: XXX
server: https://XXX.azmk8s.io:443
name: XXX
contexts:
- context:
cluster: XXX
user: clusterUser_XXX
name: XXX
current-context: XXX
kind: Config
preferences: {}
users:
- name: clusterUser_XXX
user:
exec:
apiVersion: client.authentication.k8s.io/v1beta1
args:
- get-token
- --login
- azurecli
- --server-id
- XXX
command: kubelogin
env: null
provideClusterInfo: false
So I don't know what runs behind the terraform curtain, but I suspect the kubelogin part of the steps is not accounted for, thus getting the "unauthorized" response because it does not get the token from the azurecli context.
References
- https://github.com/hashicorp/terraform-provider-kubernetes/issues/1964
- https://github.com/hashicorp/terraform-provider-azurerm/issues/21183
- https://github.com/hashicorp/terraform-provider-azurerm/pull/21229
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Tagging @browley86 here, I think this time its a kubernetes-provider issue.
I guess this also is the fact for the helm provider. I link the issues: https://github.com/hashicorp/terraform-provider-helm/issues/1114
Hey @slzmruepp, I actually think there is a way to get this working but the setup has to be done at the provider level. There is a post about this where the user uses the Exec plugins that the k8s provider exposes. I haven't had time to give it a go myself but that was my plan initially after getting the kubeconfig going using the new API endpoint that was just released. I'll try and get this going in the next few days.
Ok so I got it to work, there is good news and bad news. The bad news is that, giving the permissions of the Service Principals, it cannot read the required --server-id
field from the Kubernetes Enterprise App named "Azure Kubernetes Service AAD Server". There is a app registration in Azure, outlined via the blog post, that you need the Application Id from to use with kubelogin
:
data "azuread_service_principal" "aks" {
display_name = "Azure Kubernetes Service AAD Server"
}
Which is fed into the kubelogin:
"--server-id",
data.azuread_service_principal.aks.application_id,
My Service Principal, out of the box, doesn't have the rights to lookup that user. Instead, because I had the server id in my own .kube/config
file, I took a shortcut and extended the HashiCorp Vault secret to include the app registration Application Id which got it working. Here is my provider block:
provider "kubernetes" {
host = data.azurerm_kubernetes_cluster.aks.kube_config[0].host
cluster_ca_certificate = base64decode(
data.azurerm_kubernetes_cluster.aks.kube_config[0].cluster_ca_certificate,
)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "/usr/local/bin/kubelogin"
args = [
"get-token",
"--login",
"spn",
"--environment",
"AzurePublicCloud",
"--tenant-id",
data.vault_generic_secret.service_principal.data["tenantId"],
"--server-id",
data.vault_generic_secret.service_principal.data["azure_k8s_service_app_id"],
"--client-id",
data.vault_generic_secret.service_principal.data["clientId"],
"--client-secret",
data.vault_generic_secret.service_principal.data["clientSecret"]
]
}
}
There may be another way to get the app id from the initial data source but I did not see an option on an admittedly quick scan. The vault workaround is hacky but good enough for me in the short term. Hope that helps.
Thank you, but I consider as a workaround. This is not as documentation suggests. Is there an effort from hashicorp to solve this in the provider code?
Also, when the data structure in tf fetches the kubeconf from the listClusterUserCredential Endpoint, it gets back the kubeconf as a base64 encoded string. If you look at my post where I show the decoded kubeconf, this kubeconf contains the server-id. Is there a way to use the azcli token to authenticate to the cluster in the background?
But, I would suggest here this is a bug. First, the documentation implicates that it works OOB like this. See here under provider setup: https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/guides/getting-started
Second, there is no documentation about kubelogin use. I am talking about a sole CI/CD approach. We never run TF locally, only in the pipelines. So the pipeline agents would have kubelogin available. Isnt it enough to reformat the kubeconf in the background to use azcli? Basically what kubelogin does? Because when you run tf in pipeline, an azcli token must be existing somehow anyway, right?
From:
users:
- name: clusterUser_XXX
user:
exec:
apiVersion: client.authentication.k8s.io/v1beta1
args:
- get-token
- --environment
- AzurePublicCloud
- --server-id
- XXX
- --client-id
- XXX
- --tenant-id
- XXX
- --login
- devicecode
command: kubelogin
env: null
provideClusterInfo: false
to:
users:
- name: clusterUser_XXX
user:
exec:
apiVersion: client.authentication.k8s.io/v1beta1
args:
- get-token
- --login
- azurecli
- --server-id
- XXX
command: kubelogin
env: null
provideClusterInfo: false
Thanks
@slzmruepp - so uh, I did it. I didn't like it. But yeah:
yamldecode(data.azurerm_kubernetes_cluster.aks.kube_config_raw)["users"][0]["user"]["auth-provider"]["config"]["apiserver-id"]
That will get it without my hack of hard-coding it in a different backend. Some context: I had to plan to a file then decode the file to actually see where in the JSON the value showed up. Aside from my references to my vault variable, the only other place it showed up is in the kube_config_raw
. I then spent the better part of an hour hacking around to get the above. This is not definitely ideal. It feels like the provider should export this in some way and, unless I missed something, it looks like it only shows up in the kube_config_raw
which then forces end-users to parse. I don't know that I would call this a "bug" on the provider side as kubelogin is the thing forcing this. That said, it would be a very nice feature to have this as an available export for the azurerm_kubernetes_cluster
because, otherwise, people have to do the above hack.
Edit for completeness, here is the final provider block:
provider "kubernetes" {
host = data.azurerm_kubernetes_cluster.aks.kube_config[0].host
cluster_ca_certificate = base64decode(
data.azurerm_kubernetes_cluster.aks.kube_config[0].cluster_ca_certificate,
)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "/usr/local/bin/kubelogin"
args = [
"get-token",
"--login",
"spn",
"--environment",
"AzurePublicCloud",
"--tenant-id",
data.vault_generic_secret.service_principal.data["tenantId"],
"--server-id",
yamldecode(data.azurerm_kubernetes_cluster.aks.kube_config_raw)["users"][0]["user"]["auth-provider"]["config"]["apiserver-id"],
"--client-id",
data.vault_generic_secret.service_principal.data["clientId"],
"--client-secret",
data.vault_generic_secret.service_principal.data["clientSecret"]
]
}
}
Haha, yes this is quite a bit. Problem with Azure DevOps Pipelines and the terraform tasks is, there is no access to the sp secret. So this is all done in the background. Maybe one approach would be to fetch the kubeconf and put it on the filesystem, then run kubelogin convert-kubeconfig -l azurecli and in the provider section just refer to this kubeconf file. But how can such things run before provider init?
@browley86 thanks for the effort, but I dont think this is a feasible approach because to fetch the service principal keys from a key vault would have been possible all along. But we dont even know the secrets ourself because we create them programmatically and create directly the service connections in azure devops. So, I would suggest there should be a feasable solution from the provider itself, thus I filed this issue. The azurecli token is there, the terraform authenticates with it and it works. So the same service connection (SP) should be able to use the token to authenticate to the k8s plane.
We have kubelogin on our azure agents but how to configure properly is a mess. Also the documentation of the provider is still wrong because it just dont work...
I tried this but still get unauthorized when trying reading a data "kubernetes_namespace_v1" object:
# Configure the Microsoft Azure Provider
provider "kubernetes" {
host = data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.host
username = data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.username
password = data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.password
client_certificate = base64decode(data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.client_certificate)
client_key = base64decode(data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.client_key)
cluster_ca_certificate = base64decode(data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.cluster_ca_certificate)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "kubelogin"
args = [
"get-token",
"--login",
"azurecli",
"--server-id",
"Manually extracted for testing"
]
}
}
I dont think this is a feasible approach because to fetch the service principal keys from a key vault would have been possible all along. But we dont even know the secrets ourself because we create them programmatically and create directly the service connections in azure devops.
In fairness to those tracking the issue, this is a different problem though: the password for the service principal is exposed via the Service Principal Password which can then be referenced later in the run or, if another team is creating the SP via terraform in a different repo, they would need to put the password in some backend like Hashicorp vault or Azure Key Vault so that your user could pick it up later (it looks like the data lookups for SPs don't have the password).
Re: the idea of making the file, it might work using local_file with kube_config_raw
and then throwing a depends_on
for the provider. That said, that feels like it is once again getting out-of-scope for the actual issue: the azurerm_kubernetes_cluster
resource provides no "nice way" of getting the server-id for kubelogin and should be added as an enhancement so people don't have hack around it. Kubelogin is not going away anytime soon so it would be very helpful.
Edit: just noticed there is local_sensitive_file which is way more appropriate for the kubeconfig, so some quick pseudo-code:
resource "local_sensitive_file" "kubeconfig" {
content = yamldecode(data.azurerm_kubernetes_cluster.aks.kube_config_raw)
filename = var.kubeconfig_filepath
}
The issue there though is now there's a file with sensitive stuff lying around and would need to be cleaned at the end of every run.
So the catch22 when I try the following is, that the plan step always fails because the kubeconf is not existing yet, thus our pipelines fail.
resource "local_sensitive_file" "kubeconfig" {
content = data.azurerm_kubernetes_cluster.aks_provider_config.kube_config_raw
filename = "./kubeconfig"
provisioner "local-exec" {
command = "kubelogin convert-kubeconfig --login azurecli --kubeconfig ./kubeconfig"
}
}
provider "kubernetes" {
config_path = local_sensitive_file.kubeconfig.filename
}
Is there someone from Team Hashicorp watching this? Any Solutions? Thanks!
Ok so now I got an acceptable solution: First, the Azure Kubernetes Service AAD Server Enterprise Application ID is the same for each cluster in the same tenant. It is even the same in different subscriptions. So if you run more AKS clusters per environment, the Application ID == --server-id is the SAME. We now hardcoded the id as variable and I can say this works:
provider "kubernetes" {
host = data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.host
cluster_ca_certificate = base64decode(data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.cluster_ca_certificate)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "kubelogin"
args = [
"get-token",
"--login",
"azurecli",
"--server-id",
var.env_config[var.ENV][ "server_id" ]
]
}
}
So the main issue with my former approach was that I was under the impression that if I dont remove these lines:
username = data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.username
password = data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.password
client_certificate = base64decode(data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.client_certificate)
client_key = base64decode(data.azurerm_kubernetes_cluster.aks_provider_config.kube_config.0.client_key)
from the provider config, it still uses the kubelogin token. BUT IT DONT. IT TRIES TO AUTH WITH THE CERT and KEY. Thats why my first approach failed...
So if you run your tf in an azcli token environment this is the way to go...
Still I think this should be a built in feature of the provider to support such behavior without hacking the exec plugin.
@slzmruepp - so uh, I did it. I didn't like it. But yeah:
yamldecode(data.azurerm_kubernetes_cluster.aks.kube_config_raw)["users"][0]["user"]["auth-provider"]["config"]["apiserver-id"]
That will get it without my hack of hard-coding it in a different backend. Some context: I had to plan to a file then decode the file to actually see where in the JSON the value showed up. Aside from my references to my vault variable, the only other place it showed up is in the
kube_config_raw
. I then spent the better part of an hour hacking around to get the above. This is not definitely ideal. It feels like the provider should export this in some way and, unless I missed something, it looks like it only shows up in thekube_config_raw
which then forces end-users to parse. I don't know that I would call this a "bug" on the provider side as kubelogin is the thing forcing this. That said, it would be a very nice feature to have this as an available export for theazurerm_kubernetes_cluster
because, otherwise, people have to do the above hack.Edit for completeness, here is the final provider block:
provider "kubernetes" { host = data.azurerm_kubernetes_cluster.aks.kube_config[0].host cluster_ca_certificate = base64decode( data.azurerm_kubernetes_cluster.aks.kube_config[0].cluster_ca_certificate, ) exec { api_version = "client.authentication.k8s.io/v1beta1" command = "/usr/local/bin/kubelogin" args = [ "get-token", "--login", "spn", "--environment", "AzurePublicCloud", "--tenant-id", data.vault_generic_secret.service_principal.data["tenantId"], "--server-id", yamldecode(data.azurerm_kubernetes_cluster.aks.kube_config_raw)["users"][0]["user"]["auth-provider"]["config"]["apiserver-id"], "--client-id", data.vault_generic_secret.service_principal.data["clientId"], "--client-secret", data.vault_generic_secret.service_principal.data["clientSecret"] ] } }
Hi @slzmruepp according to our documentation this is the correct way to configure the provider for auth plugins.
@sheneska - could you please provide a link to that documentation?
©sheneska Obviously this is not the way how it works when you use azurecli context login which a lot of TF Tasks in pipelines do. So either way is not part of the documentation so far. At least I did not find it. So this feels pretty hacky and only some sources on the internet document at least the SPN way of configure the exec plugin of the provider. So if this is the "official way", I would certainly expect this to be documented. See here: https://github.com/hashicorp/terraform-provider-kubernetes/tree/main/_examples/aks https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs#exec-plugins https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/guides/getting-started#provider-setup
All documentation points to a certainly NOT working approach for limited permissions including AKS kubelogin.
Hi @browley86 we are documenting the use of using exec plugins, however we are not able to document every configuration argument for every plugin that is available. Please refer to the documentation for the specific plugins on how to actual configure them.
@sheneska, so totally get it, the exec statement is a sort of catch-all for kubernetes plugins. Documenting every use case is impossible. That said, in the case of Azure AKS service, they seem to have at least, for the time being, standardized on kubelogin so documenting that use case would probably be worthwhile. That said, my ask here is less as a bug and more of a feature request: it would be extremely nice/convenient for the data object to expose the server ID for the kubelogin plugin. So, for example, instead of using:
"--server-id",
yamldecode(data.azurerm_kubernetes_cluster.aks.kube_config_raw)["users"][0]["user"]["auth-provider"]["config"]["apiserver-id"],
It would be way nicer to use
"--server-id",
data.azurerm_kubernetes_cluster.aks.server_id
Considering the terraform provider gets this as part of it's response in the code, it would be nice to be able to expose it as an additional output so the end-user(s) can leverage it without having to hack the kube_config_raw
portion of the data lookup as I did above. Hopefully that makes sense but if not please let me know.
@browley86 would it make it easier to use something like
"--tenant-id",
data.azurerm_client_config.current.tenant_id,
and
"--client-id",
data.azurerm_client_config.current.client_id,
to lookup those two from the current context rather than having to also look them up from your vault / key_vault ?
It does not help with getting the server_id however ;)
@sheneska, so totally get it, the exec statement is a sort of catch-all for kubernetes plugins. Documenting every use case is impossible. That said, in the case of Azure AKS service, they seem to have at least, for the time being, standardized on kubelogin so documenting that use case would probably be worthwhile. That said, my ask here is less as a bug and more of a feature request: it would be extremely nice/convenient for the data object to expose the server ID for the kubelogin plugin. So, for example, instead of using:
"--server-id", yamldecode(data.azurerm_kubernetes_cluster.aks.kube_config_raw)["users"][0]["user"]["auth-provider"]["config"]["apiserver-id"],
It would be way nicer to use
"--server-id", data.azurerm_kubernetes_cluster.aks.server_id
Considering the terraform provider gets this as part of it's response in the code, it would be nice to be able to expose it as an additional output so the end-user(s) can leverage it without having to hack the
kube_config_raw
portion of the data lookup as I did above. Hopefully that makes sense but if not please let me know.
Btw. it looks like the yamldecode is not working anymore or it is not working because of our AKS API setup here (private API server with VNet integration):
╷
│ Error: Invalid index
│
│ on .terraform/modules/aks/versions.tf line 34, in provider "kubernetes":
│ 34: yamldecode(azurerm_kubernetes_cluster.kubernetes_cluster.kube_config_raw)["users"][0]["user"]["auth-provider"]["config"]["apiserver-id"],
│
│ The given key does not identify an element in this collection value.
╵
@sheneska, so totally get it, the exec statement is a sort of catch-all for kubernetes plugins. Documenting every use case is impossible. That said, in the case of Azure AKS service, they seem to have at least, for the time being, standardized on kubelogin so documenting that use case would probably be worthwhile. That said, my ask here is less as a bug and more of a feature request: it would be extremely nice/convenient for the data object to expose the server ID for the kubelogin plugin. So, for example, instead of using:
"--server-id", yamldecode(data.azurerm_kubernetes_cluster.aks.kube_config_raw)["users"][0]["user"]["auth-provider"]["config"]["apiserver-id"],
It would be way nicer to use
"--server-id", data.azurerm_kubernetes_cluster.aks.server_id
Considering the terraform provider gets this as part of it's response in the code, it would be nice to be able to expose it as an additional output so the end-user(s) can leverage it without having to hack the
kube_config_raw
portion of the data lookup as I did above. Hopefully that makes sense but if not please let me know.Btw. it looks like the yamldecode is not working anymore or it is not working because of our AKS API setup here (private API server with VNet integration):
╷ │ Error: Invalid index │ │ on .terraform/modules/aks/versions.tf line 34, in provider "kubernetes": │ 34: yamldecode(azurerm_kubernetes_cluster.kubernetes_cluster.kube_config_raw)["users"][0]["user"]["auth-provider"]["config"]["apiserver-id"], │ │ The given key does not identify an element in this collection value. ╵
So my above account got swallowed but the email made its way to my personal account. Anywho, I had the wrong key above, it should be:
server_id = yamldecode(azurerm_kubernetes_cluster.aks.kube_config_raw)["users"][0]["user"]["exec"]["args"][4]
I am not sure why this extended server_id is required in your context?
I got the following code working:
# data "azuread_service_principal" "aks" {
# display_name = "Azure Kubernetes Service AAD Server"
# }
provider "kubernetes" {
host = module.aks.cluster.kube_config.0.host
cluster_ca_certificate = base64decode(
module.aks.cluster.kube_config[0].cluster_ca_certificate,
)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "kubelogin"
args = [
"get-token",
"--login",
"azurecli",
"--server-id",
"6dae42f8-4368-4678-94ff-3960e28e3630" # data.azuread_service_principal.aks.client_id
]
}
}
resource "kubernetes_namespace" "default" {
metadata {
name = "helloworld"
}
}
I have local accounts disabled and I am using my user account with the "Azure Kubernetes Service RBAC Cluster Admin" role assigned. I would expect that any azure cli authenticated context would work here?
I am not sure why this extended server_id is required in your context?
I got the following code working:
# data "azuread_service_principal" "aks" { # display_name = "Azure Kubernetes Service AAD Server" # }
Sorry for the long delay but I just wanted to close the loop here: this, above, is the best answer. In short, Microsoft creates an Enterprise Application called "Azure Kubernetes Service AAD Server" and the Application ID of that Enterprise app is the server_id
. A quick aside, this blew my mind 🤯 . Anyway, instead of using the kube_config_raw
path returned by the cluster build, it is way easier to just use a data lookup:
data "azuread_service_principal" "aks" {
display_name = "Azure Kubernetes Service AAD Server"
}
provider "kubectl" {
host = module.aks.cluster.kube_config.0.host
cluster_ca_certificate = base64decode(
module.aks.cluster.kube_config[0].cluster_ca_certificate,
)
load_config_file = false
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "kubelogin"
args = [
"get-token",
"--login",
"msi",
"--client-id",
<CLIENT_ID of managed identity>,
"--server-id",
data.azuread_service_principal.aks.client_id
]
}
}
A few notes though: this is all to workaround the fact that some people consider client_ids and object_ids to be sensitive. If this is the case the data lookup with a sensitive
wrapper will work here. That said, the bigger potential issue is the separation of concerns: by using the data lookup the SP/managed identity will need access to read AD which means a whole other provider setup (azuread
vs azurerm
) and, in my case, giving the managed identity permissions to do AD look-ups. In the case of a limited environment and/or very strict permissions, this may not be available.
TLDR: If you are limited to azurerm
provider, use the kube_config_raw
path from the AKS output, otherwise, get azuread
working with the SP/managed identity and use the data lookup.
Hi everyone!
I found this issue looking for solutions and thanks to this last response I was able to understand it and it worked on my case. So I share it here in case it helps others.
In our case, we provided all the necessary roles to the Service Principal (to be cluster-admin in Kubernetes), but it was still giving 401 Unauthorized, and this was because I was trying to use a wrong --server-id
. In the kubelogin documentation they explain that this is the application used by the server side and that the access token accessing AKS clusters need to be issued for this app. As other were commenting earlier, this is the application id of the Microsoft-managed enterprise application named "Azure Kubernetes Service AAD Server". When I used this specific fixed guid for the --server-id
option... Then it started working as expected!
It also blew my mind as it's not clearly documented anywhere in Azure's documentation... But at least it's on kubelogin's documentation. In case it helps, that was the missing piece in our case
This is the magic --server-id
we were missing:
6dae42f8-4368-4678-94ff-3960e28e3630
I hope it helps others with the same problem.