data-on-eks icon indicating copy to clipboard operation
data-on-eks copied to clipboard

spark-k8s-operator requires the awscli, which doesn't work on terraform enterprise

Open dacort opened this issue 9 months ago • 4 comments

Description

When running the spark-k8s-operator example on terraform enterprise, the apply fails with the following error.

Error: Kubernetes cluster unreachable: Get "https://<ID>.sk1.us-east-1.eks.amazonaws.com/version": getting credentials: exec: executable aws not found It looks like you are trying to use a client-go credential plugin that is not installed. To learn more about this feature, consult the documentation available at: https://kubernetes.io/docs/reference/access-authn-authz/authentication/#client-go-credential-plugins

After hunting around for a while, I found this issue and realized it was likely a TF cloud/enterprise issue.

That said, the emr-eks-karpenter example doesn't have the same issue as it uses the aws_eks_cluster_auth data source instead of trying to use the AWS CLI. It'd be great to update the spark-k8s-operator with that.

  • [x] ✋ I have searched the open/closed issues and my issue is not listed.

Versions

  • Module version [Required]: latest

  • Terraform version: v1.0.5

  • Provider version(s): unsure

Reproduction Code [Required]

Steps to reproduce the behavior:

  • Attempt to deploy the spark-k8s-operator example in TF cloud

Expected behavior

  • The apply succeeds

Actual behavior

  • Experience an error during the apply

Terminal Output Screenshot(s)

Additional context

Changing to the same approach that the emr-eks-karpenter example uses succeeds.

dacort avatar Apr 30 '24 02:04 dacort

Thanks for raising the issue, @dacort!

You are correct that emr-eks-karpenter blueprint is using aws_eks_cluster_auth: See here.

And, the Spark Operator blueprint was recently updated to use exec plugin authentication, which is designed to refresh the keys more effectively than the previous approach: See here.

For the exec plugin, you need to install the AWS CLI locally as a prerequisite, as it runs a command locally to fetch the token. This is approach might not work in TFCloud if there is no AWS ClI installed in the TFCloud agent/server. I am happy for you to raise a PR for this or one of us will raise a PR using your issue.

There is ongoing debate in the community about both approaches, and both seem to frequently encounter the issue mentioned here: Authentication Issues with EKS

vara-bonthu avatar Apr 30 '24 03:04 vara-bonthu

Ahh, interesting, thanks for the context @vara-bonthu! Unfortunately as you noted, not sure what control I have over the ability to install the CLI in TFCloud. Will look into that.

dacort avatar Apr 30 '24 04:04 dacort

You can, in fact, install additional tools on the worker instance, using a null_resource resource and a local-exec provisioner. See https://developer.hashicorp.com/terraform/enterprise/run/install-software#installing-additional-tools for details.

Example:

resource "null_resource" "install-aws-cli" {
  provisioner "local-exec" {
    command = "cd /tmp && curl -sSL https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip -o awscliv2.zip && unzip -q awscliv2.zip && sudo ./aws/install"
  }
}

Note: I have not tested the above, so this may not work - kindly let us know here if a different command is required.

otterley avatar Apr 30 '24 04:04 otterley

Thanks @otterley!

dacort avatar Apr 30 '24 04:04 dacort

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions[bot] avatar May 31 '24 00:05 github-actions[bot]

Issue closed due to inactivity.

github-actions[bot] avatar Jun 10 '24 00:06 github-actions[bot]