kops icon indicating copy to clipboard operation
kops copied to clipboard

Validating kubernetes cluster with kops on AWS

Open bilalmushtaq514 opened this issue 1 year ago • 7 comments

Hi, I am facing following error while validating the cluster

Error: validation failed: unexpected error during validation: error listing nodes: Get "https://api-test-k8s-local-7afed8-88de7399e241b2e1.elb.us-east-2.amazonaws.com/api/v1/nodes": dial tcp 18.217.140.139:443: i/o timeout

bilalmushtaq514 avatar Aug 19 '23 11:08 bilalmushtaq514

Please try https://kops.sigs.k8s.io/operations/troubleshoot/.

hakman avatar Aug 21 '23 08:08 hakman

@bilalmushtaq514 it seems like you're failing to hit the k8s api after the cluster was set up. The connection timed out error makes me think that it's either a security group rule missing, or a config issue. One thing that comes to mind to begin with, is that you might've set up your cluster with an internal load balancer, or that maybe you've locked down the access to the api for certain IPs. However, it's hard to tell without getting the full context of your cluster. Would you mind sharing your cluster spec (you can do that by running kops get cluster --state <S3_BUCKET_NAME> --name <CLUSTER_NAME> -o yaml and share it here? This could shed a bit more light on what could be the root cause. Thanks!

moshevayner avatar Aug 25 '23 05:08 moshevayner

@bilalmushtaq514 it seems like you're failing to hit the k8s api after the cluster was set up. The connection timed out error makes me think that it's either a security group rule missing, or a config issue. One thing that comes to mind to begin with, is that you might've set up your cluster with an internal load balancer, or that maybe you've locked down the access to the api for certain IPs. However, it's hard to tell without getting the full context of your cluster. Would you mind sharing your cluster spec (you can do that by running kops get cluster --state <S3_BUCKET_NAME> --name <CLUSTER_NAME> -o yaml and share it here? This could shed a bit more light on what could be the root cause. Thanks!

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: "2023-08-27T15:43:34Z"
 
spec:
  api:
    dns: {}
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: aws
  configBase: s3://kops-state-21/kubevpro.grooply.online
  dnsZone: kubevpro.grooply.online
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: control-plane-us-east-1a
      name: a
    manager:
      backupRetentionDays: 90
    memoryRequest: 100Mi
    name: main
  - cpuRequest: 100m
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: control-plane-us-east-1a
      name: a
    manager:
      backupRetentionDays: 90
    memoryRequest: 100Mi
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeProxy:
    enabled: false
  kubelet:
    anonymousAuth: false
  kubernetesApiAccess:
  - 0.0.0.0/0
  - ::/0
  kubernetesVersion: 1.27.5
  masterPublicName: api.kubevpro.grooply.online
  networkCIDR: 172.20.0.0/16
  networking:
    cilium:
      enableNodePort: true
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 0.0.0.0/0
  - ::/0
  subnets:
  - cidr: 172.20.32.0/19
    name: us-east-1a
    type: Public
    zone: us-east-1a
  - cidr: 172.20.64.0/19
    name: us-east-1b
    type: Public
    zone: us-east-1b
  topology:
    dns:
      type: Public
    masters: public
    nodes: public

bilalmushtaq514 avatar Aug 27 '23 16:08 bilalmushtaq514

@bilalmushtaq514 it seems like you're failing to hit the k8s api after the cluster was set up. The connection timed out error makes me think that it's either a security group rule missing, or a config issue. One thing that comes to mind to begin with, is that you might've set up your cluster with an internal load balancer, or that maybe you've locked down the access to the api for certain IPs. However, it's hard to tell without getting the full context of your cluster. Would you mind sharing your cluster spec (you can do that by running kops get cluster --state <S3_BUCKET_NAME> --name <CLUSTER_NAME> -o yaml and share it here? This could shed a bit more light on what could be the root cause. Thanks!

apiVersion: kops.k8s.io/v1alpha2 kind: Cluster metadata: creationTimestamp: "2023-08-27T15:43:34Z"

spec: api: dns: {} authorization: rbac: {} channel: stable cloudProvider: aws configBase: s3://kops-state-21/kubevpro.grooply.online dnsZone: kubevpro.grooply.online etcdClusters:

  • cpuRequest: 200m etcdMembers:

    • encryptedVolume: true instanceGroup: control-plane-us-east-1a name: a manager: backupRetentionDays: 90 memoryRequest: 100Mi name: main
  • cpuRequest: 100m etcdMembers:

    • encryptedVolume: true instanceGroup: control-plane-us-east-1a name: a manager: backupRetentionDays: 90 memoryRequest: 100Mi name: events iam: allowContainerRegistry: true legacy: false kubeProxy: enabled: false kubelet: anonymousAuth: false kubernetesApiAccess:
  • 0.0.0.0/0

  • ::/0 kubernetesVersion: 1.27.5 masterPublicName: api.kubevpro.grooply.online networkCIDR: 172.20.0.0/16 networking: cilium: enableNodePort: true nonMasqueradeCIDR: 100.64.0.0/10 sshAccess:

  • 0.0.0.0/0

  • ::/0 subnets:

  • cidr: 172.20.32.0/19 name: us-east-1a type: Public zone: us-east-1a

  • cidr: 172.20.64.0/19 name: us-east-1b type: Public zone: us-east-1b topology: dns: type: Public masters: public nodes: public

It seems like the yaml structure is a bit messed up with the markdown formatting, so a bit harder to understand what goes where. Would you mind wrapping it in a fenced code block and using the yaml syntax highlighting?

Thanks!

moshevayner avatar Sep 06 '23 13:09 moshevayner

/kind support

johngmyers avatar Sep 07 '23 05:09 johngmyers

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 27 '24 20:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Feb 26 '24 21:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Mar 27 '24 22:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 27 '24 22:03 k8s-ci-robot