kops icon indicating copy to clipboard operation
kops copied to clipboard

NLB for apiserver port 8443 is unreachable

Open jim-barber-he opened this issue 1 year ago • 10 comments

/kind bug

1. What kops version are you running? The command kops version, will display this information.

It's actually a build from the master branch since I was testing a fix for another issue that I had previously encountered (that is now fixed). It was built like so:

$ go version  
go version go1.21.3 linux/amd64
$ export S3_BUCKET=hetest-kops
$ export VERSION=1.28.0-dev.1 
$ make kops-install VERSION=$VERSION
$ make upload S3_BUCKET=s3://$S3_BUCKET VERSION=$VERSION

And results in

$ kops version
Client version: 1.28.0-dev.1 (git-v1.29.0-alpha.1-139-gab5b8a873a)

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag.

$ kubectl version                                            
Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.2

3. What cloud provider are you using?

AWS

4. What commands did you run? What is the simplest way to reproduce this issue?

I have created a cluster from a manifest as per how usually create clusters. This time using this custom kops to deploy a 1.28 k8s cluster.

5. What happened after the commands executed?

I couldn't talk to the API server externally.

6. What did you expect to happen?

Be able to use the cluster.

7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

I'll just provide the part that probably matters. We use a custom SSL certificate for the API server with a name of api.$CLUSTER_NAME. It is provided in the cluster spec like so:

spec:
  api:
    loadBalancer:
      class: Network
      sslCertificate: arn:aws:acm:$AWS_REGION:$AWS_ACCOUNT:certificate/REDACTED
      sslPolicy: ELBSecurityPolicy-TLS13-1-3-2021-06
      type: Internal

When using a custom SSL certificate, rules are created in the NLB for port 8443 in addition to 443 and the kops export kubecfg --admin --name $CLUSTER created entries in .kube/config referring to port 8443.

8. Please run the commands with most verbose logging by adding the -v 10 flag. Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know?

The NLB for the new cluster has a security group attached which is a relatively new feature from AWS. Examining it, it had rules for port 443 but none for 8443. Editing the security group and duplicating the rules for 443 for 8443 fixes my problem.

jim-barber-he avatar Oct 16 '23 06:10 jim-barber-he

the original PR which breaks this https://github.com/kubernetes/kops/pull/15993

we should add 8443 rules to nlb security group as well (if needed)

zetaab avatar Oct 16 '23 06:10 zetaab

This should probably be very similar to https://github.com/kubernetes/kops/pull/16006.

hakman avatar Oct 17 '23 12:10 hakman

@hakman Can I take this up?

karanrn avatar Dec 21 '23 04:12 karanrn

Should the addition of the 8443 port to the security group rule be conditional or we do it by default along with 443?

karanrn avatar Dec 21 '23 08:12 karanrn

@karanrn Sure, it is conditional, same condition as for adding the port to the NLB.

hakman avatar Dec 21 '23 15:12 hakman

@jim-barber-he I have a fix, but I wanted to confirm with you that client-cert authentication does not work with a custom-certificate on port 443 (but does work on port 8443). So I assume you're using one of the auth systems like dex (?)

justinsb avatar Mar 13 '24 01:03 justinsb

@jim-barber-he I have a fix, but I wanted to confirm with you that client-cert authentication does not work with a custom-certificate on port 443 (but does work on port 8443). So I assume you're using one of the auth systems like dex (?)

I am using the AWS IAM Authenticator as set up by kops via:

spec:
  aws:
    backendMode: CRD
    clusterID: $CLUSTER_NAME
    identityMappings:
      - arn: arn: arn:aws:iam::$AWS_ACCOUNT_ID:role/$ROLE_ADMIN
        groups:
          - system:masters
        username: admin:{{\`{{SessionNameRaw}}\`}}
      - ...

Plus CRDs for other IAM to k8s mappings.

We did have dex in place a number of years ago when we first came up with the cluster specs.

jim-barber-he avatar Mar 14 '24 03:03 jim-barber-he

@jim-barber-he Thank you for the explanation! Any way to test the https://github.com/kubernetes/kops/pull/16405 cherry-pick we did yesterday for kOps 1.28?

hakman avatar Mar 14 '24 04:03 hakman

Yeah I can give it a go by rolling out a test cluster, but probably won't be until next week.

FYI: kOps 1.28 wasn't having an issue for me, it was when I built from the master branch to test something else that I hit the problem, so reported this before it ended up breaking kOps 1.29 for me. So to test I assume I'd need to build from master again?

jim-barber-he avatar Mar 14 '24 04:03 jim-barber-he

Ah, cool. Not sure if so easy to test the master brach build (there may be some nodeup changes), but please give it a try. We will also release an official beta.1 this week.

hakman avatar Mar 14 '24 04:03 hakman

I've rolled out a k8s 1.28.8 cluster this morning with a kops binary built from the v1.29.0-beta.1 git tag. The cluster has come up properly and is accessible.

The security group associated with the API server NLB has rules in it for port 8443. image

So it looks like this issue is fixed.

jim-barber-he avatar Mar 20 '24 00:03 jim-barber-he

Perfect. Thank you for confirming @jim-barber-he! /close

hakman avatar Mar 20 '24 05:03 hakman

@hakman: Closing this issue.

In response to this:

Perfect. Thank you for confirming @jim-barber-he! /close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 20 '24 05:03 k8s-ci-robot