kops icon indicating copy to clipboard operation
kops copied to clipboard

Adding a subnet to the cluster.spec throws a null pointer exception

Open julienperignon opened this issue 3 years ago • 8 comments

/kind bug

1. What kops version are you running? The command kops version, will display this information.

Version 1.23.2 (git-4125bbbe975ca104c609ceaa625a9a5ff3ac19f4)

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag.

1.23.6

3. What cloud provider are you using?

AWS

4. What commands did you run? What is the simplest way to reproduce this issue?

Added two new subnets by editing the cluster spec (one private and one utility) kops update cluster --state s3://XXX

5. What happened after the commands executed?

W0614 13:00:36.185532   11339 template_functions.go:420] --watch-ingress=true set on dns-controller
W0614 13:00:36.185579   11339 template_functions.go:421] this may cause problems with previously defined services: https://github.com/kubernetes/kops/issues/2496
W0614 13:00:36.610814   11339 external_access.go:39] KubernetesAPIAccess is empty
I0614 13:00:40.364585   11339 executor.go:111] Tasks: 0 done / 116 total; 51 can run
I0614 13:00:41.841497   11339 executor.go:111] Tasks: 51 done / 116 total; 26 can run
I0614 13:00:42.943478   11339 executor.go:111] Tasks: 77 done / 116 total; 31 can run
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x2db89ee]

goroutine 790 [running]:
k8s.io/kops/upup/pkg/fi/cloudup/awstasks.(*NetworkLoadBalancer).CheckChanges(0x2, 0xc0007363c0, 0xc000736480, 0x1)
        upup/pkg/fi/cloudup/awstasks/network_load_balancer.go:494 +0x32e
reflect.Value.call({0x451d860, 0xc000736480, 0x0}, {0x469d4a3, 0x4}, {0xc001661680, 0x3, 0x51497a0})
        GOROOT/src/reflect/value.go:556 +0x845
reflect.Value.Call({0x451d860, 0xc000736480, 0x0}, {0xc001661680, 0x3, 0x4})
        GOROOT/src/reflect/value.go:339 +0xc5
k8s.io/kops/util/pkg/reflectutils.InvokeMethod({0x451d860, 0xc000736480}, {0x46b0225, 0x9}, {0xc00117dd78, 0x3, 0x51497a0})
        util/pkg/reflectutils/walk.go:77 +0x3c6
k8s.io/kops/upup/pkg/fi.invokeCheckChanges({0x4f6e6e0, 0xc0007363c0}, {0x4f6e6e0, 0xc000736480}, {0x4f6e6e0, 0xc00020f200})
        upup/pkg/fi/default_methods.go:114 +0xd6
k8s.io/kops/upup/pkg/fi.DefaultDeltaRunMethod({0x4f6e6e0, 0xc000736480}, 0xc0007da140)
        upup/pkg/fi/default_methods.go:71 +0x3e8
k8s.io/kops/upup/pkg/fi/cloudup/awstasks.(*NetworkLoadBalancer).Run(0xc0008eb4d0, 0x2e)
        upup/pkg/fi/cloudup/awstasks/network_load_balancer.go:450 +0x3b
k8s.io/kops/upup/pkg/fi.(*executor).forkJoin.func1(0xc000d323f0, 0x1)
        upup/pkg/fi/executor.go:187 +0x263
created by k8s.io/kops/upup/pkg/fi.(*executor).forkJoin
        upup/pkg/fi/executor.go:183 +0x85

6. What did you expect to happen? The update to print the changes so i could proceed with the update

7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

kind: Cluster
metadata:
  creationTimestamp: "2021-05-14T04:01:25Z"
  generation: 32
  name: devops.redacted
spec:
  additionalNetworkCIDRs:
  - 10.80.0.0/22
  additionalPolicies:
    master: |
      [
        {
            "Effect":"Allow",
            "Action":[
              "ec2:*",
              "elasticloadbalancing:*",
              "iam:CreateServiceLinkedRole",
              "iam:GetServerCertificate",
              "iam:ListServerCertificates",
              "tag:GetResources",
              "elasticloadbalancing:*",
              "autoscaling:*",
              "acm:ListCertificates",
              "acm:GetCertificate",
              "acm:DescribeCertificate",
              "waf-regional:GetWebACLForResource",
              "route53:ListHostedZones",
              "route53:ListResourceRecordSets"
            ],
            "Resource":[
              "*"
            ]
        }
      ]
    node: |
      [
        {
            "Effect":"Allow",
            "Action":[
              "ec2:Describe*",
              "ec2:CreateSecurityGroup",
              "ec2:AuthorizeSecurityGroupIngress",
              "ec2:CreateTags",
              "iam:CreateServiceLinkedRole",
              "iam:GetServerCertificate",
              "iam:ListServerCertificates",
              "tag:GetResources",
              "elasticloadbalancing:*",
              "elasticloadbalancingv2:*",
              "autoscaling:*",
              "acm:ListCertificates",
              "acm:GetCertificate",
              "acm:DescribeCertificate",
              "waf-regional:GetWebACLForResource",
              "route53:ListHostedZones",
              "route53:ListResourceRecordSets",
              "cloudformation:*"
            ],
            "Resource":[
              "*"
            ]
        },
        {
            "Effect":"Allow",
            "Action":[
              "route53:ChangeResourceRecordSets"
            ],
            "Resource":[
              "arn:aws:route53:::hostedzone/*"
            ]
        },
        {
            "Effect":"Allow",
            "Action":[
              "ec2:*"
            ],
            "Resource":[
              "*"
            ]
        }
      ]
  api:
    loadBalancer:
      additionalSecurityGroups:
      - redacted
      class: Network
      crossZoneLoadBalancing: true
      sslCertificate: redacted
      type: Internal
  authorization:
    rbac: {}
  awsLoadBalancerController:
    enabled: true
  certManager:
    enabled: true
    managed: true
  channel: stable
  cloudConfig:
    disableSecurityGroupIngress: true
    elbSecurityGroup: redacted
  cloudControllerManager:
    allocateNodeCIDRs: true
    cloudProvider: aws
    clusterCIDR: 100.64.0.0/10
    clusterName: devops.redacted
    image: k8s.gcr.io/provider-aws/cloud-controller-manager:v1.23.0-alpha.0
    leaderElection:
      leaderElect: true
  cloudLabels:
    cumulus-stack: wo2-devops
  cloudProvider: aws
  configBase: s3://wo2-devops-k8s-kops/devops.redacted
  containerRuntime: docker
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - instanceGroup: master-ap-southeast-2b
      name: b
    memoryRequest: 100Mi
    name: main
    version: 3.2.24
  - cpuRequest: 100m
    etcdMembers:
    - instanceGroup: master-ap-southeast-2b
      name: b
    memoryRequest: 100Mi
    name: events
    version: 3.2.24
  externalDns:
    watchIngress: true
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeAPIServer:
    cloudProvider: external
    disableBasicAuth: true
    runtimeConfig:
      autoscaling/v2beta1: "true"
  kubeControllerManager:
    podEvictionTimeout: 30s
  kubelet:
    anonymousAuth: false
    authenticationTokenWebhook: true
    authorizationMode: Webhook
    cloudProvider: external
    networkPluginName: cni
  kubernetesVersion: 1.23.6
  masterInternalName: api-internal-devops.redacted
  masterPublicName: k8s-api-devops.redacted
  networkCIDR: 10.80.0.0/22
  networkID: vpc-00adc685acade9613
  networking:
    calico: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 10.0.0.0/24
  subnets:
  - cidr: 10.80.0.0/25
    name: ap-southeast-2b
    type: Private
    zone: ap-southeast-2b
  - cidr: 10.80.1.0/25
    name: ap-southeast-2a
    type: Private
    zone: ap-southeast-2a
  - cidr: 10.80.0.128/27
    name: utility-ap-southeast-2b
    type: Utility
    zone: ap-southeast-2b
  - cidr: 10.80.1.128/27
    name: utility-ap-southeast-2a
    type: Utility
    zone: ap-southeast-2a
  topology:
    dns:
      type: Public
    masters: private
    nodes: private

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2021-05-14T04:01:28Z"
  generation: 9
  labels:
    kops.k8s.io/cluster: devops.redacted
  name: agents
spec:
  cloudLabels:
    k8s.io/cluster-autoscaler/devops.redacted: owned
    k8s.io/cluster-autoscaler/enabled: "true"
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20220404
  machineType: c5.xlarge
  maxSize: 4
  minSize: 1
  nodeLabels:
    agent: "true"
    kops.k8s.io/instancegroup: agents
    node-role.kubernetes.io/agents: "true"
  role: Node
  securityGroupOverride: redacted
  subnets:
  - ap-southeast-2b
  taints:
  - agent=true:NoSchedule

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2022-06-14T02:15:00Z"
  generation: 1
  labels:
    kops.k8s.io/cluster: devops.redacted
  name: devops
spec:
  cloudLabels:
    k8s.io/cluster-autoscaler/devops.redacted: owned
    k8s.io/cluster-autoscaler/enabled: "true"
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20220404
  machineType: c5.large
  maxSize: 1
  minSize: 1
  nodeLabels:
    devops: "true"
    kops.k8s.io/instancegroup: devops
    node-role.kubernetes.io/devops: "true"
  role: Node
  securityGroupOverride: redacted
  subnets:
  - ap-southeast-2b
  taints:
  - devops=true:NoSchedule

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2021-05-14T04:01:28Z"
  generation: 3
  labels:
    kops.k8s.io/cluster: devops.redacted
  name: master-ap-southeast-2b
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20220404
  machineType: t3.medium
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-ap-southeast-2b
  role: Master
  subnets:
  - ap-southeast-2b

julienperignon avatar Jun 14 '22 04:06 julienperignon

I could not immediately reproduce this in the master branch. Do you have the chance to test using the kops 1.24 beta 2?

olemarkus avatar Jun 14 '22 07:06 olemarkus

I could not reproduce this in the 1.23 branch either.

olemarkus avatar Jun 14 '22 08:06 olemarkus

@olemarkus do you need more details?

julienperignon avatar Jun 14 '22 14:06 julienperignon

Maybe if I can get the before/after cluster spec? Not much to go on, I am afraid. I'll try to reproduce some more soon.

olemarkus avatar Jun 15 '22 09:06 olemarkus

I tried to use your example above more or less verbatim, and I still fail to reproduce using kops 1.23.

olemarkus avatar Jun 15 '22 12:06 olemarkus

@olemarkus I'll provide before and after cluster spec when i get my hands back on my work computer

julienperignon avatar Jun 15 '22 13:06 julienperignon

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Sep 13 '22 14:09 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Oct 13 '22 14:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Nov 12 '22 15:11 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Nov 12 '22 15:11 k8s-ci-robot

I had this issue yesterday when i was creating a cluster in a shared VPC and subnets were also shared. I didn't give the subnet ids in the cluster.yaml that's when i got this issue. Giving the subnet ids in the cluser.yaml fixed the issue for me.

pallavikamboj123 avatar Apr 26 '23 05:04 pallavikamboj123