data-on-eks icon indicating copy to clipboard operation
data-on-eks copied to clipboard

failed calling webhook "mservice.elbv2.k8s.aws"

Open mayurbhagia opened this issue 1 year ago • 4 comments
trafficstars

Installing Spark Operator with YuniKorn on Cloud9 in my AWS account and install.sh is ending with below two errors:

Error: 2 errors occurred: │ * Internal error occurred: failed calling webhook "mservice.elbv2.k8s.aws": failed to call webhook: Post "https://aws-load-balancer-webhook-service.kube-system.svc:443/mutate-v1-service?timeout=10s": dial tcp 100.64.184.123:9443: connect: connection refused │ * Internal error occurred: failed calling webhook "mservice.elbv2.k8s.aws": failed to call webhook: Post "https://aws-load-balancer-webhook-service.kube-system.svc:443/mutate-v1-service?timeout=10s": dial tcp 100.64.184.123:9443: connect: connection refused

mayurbhagia avatar Feb 28 '24 12:02 mayurbhagia

I think it's a timing issue. if you try to run terraform apply or rerun install.sh again then it should fix the issue.

Please feel free to update troubleshooting guide https://github.com/awslabs/data-on-eks/blob/main/website/docs/blueprints/troubleshooting/troubleshooting.md if the issue resolved by the above approach.

vara-bonthu avatar Mar 02 '24 05:03 vara-bonthu

This consistently requires two executions of install.sh currently.

raykrueger avatar Mar 18 '24 14:03 raykrueger

I'm betting we need to bump up that 10s timeout, but currently we'd be blocked on... https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2711

raykrueger avatar Mar 18 '24 14:03 raykrueger

This is due to a mutating webhook introduced for LBC v2.5+. Per the docs...

The AWS LBC provides a mutating webhook for service resources to set the spec.loadBalancerClass field for service of type LoadBalancer on create. This makes the AWS LBC the default controller for service of type LoadBalancer. You can disable this feature and revert to set Cloud Controller Manager (in-tree controller) as the default by setting the helm chart value enableServiceMutatorWebhook to false with --set enableServiceMutatorWebhook=false . You will no longer be able to provision new Classic Load Balancer (CLB) from your kubernetes service unless you disable this feature. Existing CLB will continue to work fine.

If you do not need to have the webhook enabled then you can disable it as shown here.

  # Turn off mutation webhook for services to avoid ordering issue
  enable_aws_load_balancer_controller = true
  aws_load_balancer_controller = {
    set = [{
      name  = "enableServiceMutatorWebhook"
      value = "false"
    }]
  }

Ref: https://github.com/aws-ia/terraform-aws-eks-blueprints-addons/blob/257677adeed1be54326637cf919cf24df6ad7c06/tests/complete/main.tf#L120-L125

We should add this to our blueprints, will mark it as a bug for tracking.

askulkarni2 avatar Mar 19 '24 11:03 askulkarni2