How to run Workload Identity on GKE
Hi,
I'm using multiple google*Rverse R packages, and I would like to authenticate via Workload Identity on GKE. (Right now I do it via service account json-s.)
Is this currently possible via gargle? I was looking at the documentation page, but I gather that it will only work on AWS. If that's not the case, would it be possible to update the documentation with a more descriptive example of the steps needed to follow?
If it's not supported, would it be possible to implement it?
Thank you 🙏 Tamas
I think this is possible and am working with it now:
https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity
I believe it requires a change to the metadata call though to be similar to the example in the link above:
curl -H "Metadata-Flavor: Google" http://169.254.169.254/computeMetadata/v1/instance/service-accounts/
I guess the http://169.254.169.254/ is universal as there are no DNS servers within the pod? Or perhaps only if you are on Kubernetes (?) - anyhow checking gargle it looks like this is supported via gargle:::gce_metadata_url() and so should be bycredentials_gce() and googleAuthR::gar_gce_auth()
I think perhaps it needs to be enabled via options(gargle.gce.use_ip = TRUE)
Ok I have been able to test this finally during a refactor of an Airflow Docker container running R, Airflow being on Kubernetes. Putting it here to help with documentation.
Its the "right" way to do authentication on K8s and other places if possible since it involves not downloading keys which is a potential security risk.
- Following the docs you create a service account as normal and give it permissions and scopes needed to say upload to BigQuery, as you would before. eg.
[email protected]withhttps://www.googleapis.com/auth/bigqueryscopes - Instead of downloading a JSON key, you instead migrate that permission by adding a policy binding to another service account within Kubernetes
- Create the service account within Kubernetes, ideally within a new namespace:
# create namespace
kubectl create namespace my-namespace
# Create Kubernetes service account
kubectl create serviceaccount --namespace my-namespace bq-service-account
- Bind that Kubernetes service account to the service account outside of kubernetes you created in step 1, and assign it an annotation
# Create IAM policy binding betwwen k8s SA and GSA
gcloud iam service-accounts add-iam-policy-binding [email protected] \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:my-project.svc.id.goog[my-namespace/bq-service-account]"
# Annotate k8s SA
kubectl annotate serviceaccount bq-service-account \
--namespace my-namespace \
iam.gke.io/gcp-service-account=my-service-key@my-project.iam.gserviceaccount.com
This key will now be available to add to pods within the cluster. For Airflow, you can pass them in using the GKEPodOperator(...., namespace='my-namespace', service_account_name='bq-service-account')
- When calling the
gargle::gce_credentials()within R, you need first make sure its using the right endpoint (options(gargle.gce.use_ip = TRUE)) and then call the service email that is not "default".gargle:::list_service_accounts()was helpful in debugging (maybe export this?)
# code within the Docker container
library(bigQueryR)
options(gargle.gce.use_ip = TRUE)
gargle::credentials_gce("[email protected]")
... do authenticated stuff...
Can we conclude that the GCE method of auth covers this use case, with specific setup and usage given by @MarkEdmondson1234?
And if so, I think this has become a documentation issue?
I agree it's a documentation issue, no code changes were needed in gargle itself. It's a useful one though, being able to use no service keys is recommended as the way to do any authentication in kubernetes.
@MarkEdmondson1234 Would you be willing to make a PR? It could be pretty crude and I would finish it. But I think you have a better sense of the use case than I do, so a sketch from you would be helpful. You could also just make some suggestions here about where you would document it within gargle.
Yes ok, the main meat of it is the comment above, and I think it would be in the non-interactive auth section https://gargle.r-lib.org/articles/non-interactive-auth.html