cluster-api-provider-aws icon indicating copy to clipboard operation
cluster-api-provider-aws copied to clipboard

Panic Error: Invalid Memory Address or Nil Pointer Dereference during Taint Deletion in AWSManagedMachinePool

Open nueavv opened this issue 1 year ago • 4 comments
trafficstars

/kind bug

What steps did you take and what happened: While managing an EKS cluster using AWSManagedMachinePool, a panic error occurs when trying to delete a taint after adding it. Below are the related log messages.

Initial Error Log
E0613 05:58:09.596065 1 controller.go:329] "Reconciler error" err="panic: runtime error: invalid memory address or nil pointer dereference [recovered]" controller="awsmanagedmachinepool" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSManagedMachinePool" AWSManagedMachinePool="capi-managed-cluster/test-cluster" namespace="capi-managed-cluster" name="test-cluster" reconcileID="12345678-1234-1234-1234-123456789012"
I0613 05:58:09.917851 1 awsmanagedmachinepool_controller.go:200] "Reconciling AWSManagedMachinePool"
I0613 05:58:09.918011 1 launchtemplate.go:73] "checking for existing launch template"
E0613 05:58:10.373797 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 464 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1234560?, 0x1234560})
/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:75 +0x85
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:108 +0xb2
panic({0x1234560?, 0x1234560})
/usr/local/go/src/runtime/panic.go:914 +0x21f
sigs.k8s.io/cluster-api-provider-aws/v2/pkg/cloud/converters.TaintsFromSDK({0xc001604d60, 0x1, 0x25})
/workspace/pkg/cloud/converters/eks.go:115 +0x258
sigs.k8s.io/cluster-api-provider-aws/v2/pkg/cloud/services/eks.(*NodegroupService).createTaintsUpdate(0xc002522d20, {0x0, 0x0, 0x0}, 0xc00075c5a0)
/workspace/pkg/cloud/services/eks/nodegroup.go:420 +0x21d
sigs.k8s.io/cluster-api-provider-aws/v2/pkg/cloud/services/eks.(*NodegroupService).reconcileNodegroupConfig(0xc002522d20, 0xc00075c5a0)
/workspace/pkg/cloud/services/eks/nodegroup.go:469 +0x36e
sigs.k8s.io/cluster-api-provider-aws/v2/pkg/cloud/services/eks.(*NodegroupService).reconcileNodegroup(0xc002522d20, {0x1234560, 0xc002528570})
/workspace/pkg/cloud/services/eks/nodegroup.go:571 +0x72c
sigs.k8s.io/cluster-api-provider-aws/v2/pkg/cloud/services/eks.(*NodegroupService).ReconcilePool(0xc002522d20, {0x1234560, 0xc002528570})
/workspace/pkg/cloud/services/eks/eks.go:109 +0x11a
sigs.k8s.io/cluster-api-provider-aws/v2/exp/controllers.(*AWSManagedMachinePoolReconciler).reconcileNormal(0xc001541980, {0x1234560, 0xc002528570}, 0xc0010baf00, {0x1234560, 0xc002138100})
/workspace/exp/controllers/awsmanagedmachinepool_controller.go:239 +0x4da
sigs.k8s.io/cluster-api-provider-aws/v2/exp/controllers.(*AWSManagedMachinePoolReconciler).Reconcile(0xc001541980, {0x1234560, 0xc002528570}, {{{0xc000913320?, 0x0?}, {0xc000913380?, 0xc000d75d08?}}})
/workspace/exp/controllers/awsmanagedmachinepool_controller.go:192 +0x8b7
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x1234560?, {0x1234560?, 0xc002528570?}, {{{0xc000913320?, 0xb?}, {0xc000913380?, 0x0?}}})
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:119 +0xb7
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0014f4500, {0x1234560, 0xc0014264b0}, {0x1234560?, 0xc002360ea0?})
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:316 +0x3cc
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0014f4500, {0x1234560, 0xc0014264b0})
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266 +0x1af
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227 +0x79
created by sigs.k8s.io/controller-runtime/pkg.internal.controller.(*Controller).Start.func2 in goroutine 321
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:223 +0x565
Updated Error Log
E0613 07:06:08.244153 1 controller.go:329] "Reconciler error" err=<
failed to reconcile machine pool for AWSManagedMachinePool capi-managed-cluster/test-cluster: failed to reconcile nodegroup config: creating taints update payload: converting taints: taint has nil fields: {
Effect: "PREFER_NO_SCHEDULE",
Key: "test"
}

What did you expect to happen: Taints should be deleted without causing a panic error.

Anything else you would like to add:

  • The panic error occurs when trying to delete a taint after adding it.
  • The error message indicates that some fields of the taint are nil, but the provided taints are complete.

Current TaintsFromSDK function implementation:

import (
	"fmt"
	"log"
)

// TaintsFromSDK is used to convert an array of AWS SDK taints to CAPA Taints.
func TaintsFromSDK(taints []*eks.Taint) (expinfrav1.Taints, error) {
	converted := expinfrav1.Taints{}

	// If taints array is empty, log the event
	if len(taints) == 0 {
		log.Println("Taints array is empty")
		return converted, nil
	}

	for _, taint := range taints {
		if taint.Effect == nil || taint.Key == nil {
			return nil, fmt.Errorf("taint has nil fields: %+v", taint)
		}

		// Handle nil Value field by setting it to an empty string
		value := ""
		if taint.Value != nil {
			value = *taint.Value
		}

		convertedEffect, err := TaintEffectFromSDK(*taint.Effect)
		if err != nil {
			return nil, fmt.Errorf("converting taint effect %s: %w", *taint.Effect, err)
		}
		converted = append(converted, expinfrav1.Taint{
			Effect: convertedEffect,
			Key:    *taint.Key,
			Value:  value,
		})
	}

	return converted, nil
}

Suggestions:

Resolve the panic error that occurs during taint deletion. When the taint array is empty, ensure that existing taints are deleted. Review the logic handling empty taint arrays and nil fields.

Environment:

Cluster-api-provider-aws version: [version] 2.5.0 Kubernetes version: (use kubectl version): [version] 1.25.5 on local kubernetes OS (e.g. from /etc/os-release): [version] ubuntu 20.04

nueavv avatar Jun 13 '24 07:06 nueavv