cluster-api-provider-openstack icon indicating copy to clipboard operation
cluster-api-provider-openstack copied to clipboard

Panic if instance was deleted in openstack manually

Open 71g3pf4c3 opened this issue 10 months ago • 3 comments

/kind bug

What steps did you take and what happened: In any setup create machinedeployment, wait machines and openstackmachines to be ready state and delete instance inside openstack project (via cli or horizon ui). CAPO controller crashes with nil pointer dereference.

Traced it to: openstackmachine_controller.go reconcileNormal()

	var instanceStatus *compute.InstanceStatus
	if instanceStatus, err = computeService.GetInstanceStatus(*machineServer.Status.InstanceID); err != nil {
		return ctrl.Result{}, err
	}

	instanceNS, err := instanceStatus.NetworkStatus()  // <- here we got nil pointer dereference

Which leads to

instance.go

func (s *Service) GetInstanceStatus(resourceID string) (instance *InstanceStatus, err error) {
	if resourceID == "" {
		return nil, fmt.Errorf("resourceId should be specified to get detail")
	}

	server, err := s.getComputeClient().GetServer(resourceID)
	if err != nil {
		if capoerrors.IsNotFound(err) {
			return nil, nil     //  <- returns nil,nil. Which leads to instanceStatus.NetworkStatus() produce error

		}
		return nil, fmt.Errorf("get server %q detail failed: %v", resourceID, err)
	}

	return &InstanceStatus{server, s.scope.Logger()}, nil
}

Logs:

2025-01-29T18:06:39.883385751+03:00 panic: runtime error: invalid memory address or nil pointer dereference [recovered]
2025-01-29T18:06:39.883477120+03:00     panic: runtime error: invalid memory address or nil pointer dereference
2025-01-29T18:06:39.883490834+03:00 [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1b8f83a]
2025-01-29T18:06:39.883499159+03:00
2025-01-29T18:06:39.883505997+03:00 goroutine 350 [running]:
2025-01-29T18:06:39.883516576+03:00 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
2025-01-29T18:06:39.883523684+03:00     /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:111 +0x1e5
2025-01-29T18:06:39.883570207+03:00 panic({0x1dccbe0?, 0x362a670?})
2025-01-29T18:06:39.883599681+03:00     /usr/local/go/src/runtime/panic.go:770 +0x132
2025-01-29T18:06:39.883625108+03:00 sigs.k8s.io/cluster-api-provider-openstack/pkg/cloud/services/compute.(*InstanceStatus).NetworkStatus(0x0)
2025-01-29T18:06:39.883630041+03:00     /workspace/pkg/cloud/services/compute/instance_types.go:138 +0x3a
2025-01-29T18:06:39.883634741+03:00 sigs.k8s.io/cluster-api-provider-openstack/controllers.(*OpenStackMachineReconciler).reconcileNormal(0xc000482300, {0x2440550, 0xc000a27860}, 0xc000a530b0, {0xc0002c4d08, 0x18}, 0xc000a04b08, 0xc000a86f08, 0xc000a86a08)
2025-01-29T18:06:39.883638961+03:00     /workspace/controllers/openstackmachine_controller.go:380 +0x1fc
2025-01-29T18:06:39.883643634+03:00 sigs.k8s.io/cluster-api-provider-openstack/controllers.(*OpenStackMachineReconciler).Reconcile(0xc000482300, {0x2440550, 0xc000a27860}, {{{0xc00053f710?, 0x0?}, {0xc00034d680?, 0xc000909d10?}}})
2025-01-29T18:06:39.883647757+03:00     /workspace/controllers/openstackmachine_controller.go:161 +0xbd8
2025-01-29T18:06:39.883661273+03:00 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x24467c8?, {0x2440550?, 0xc000a27860?}, {{{0xc00053f710?, 0xb?}, {0xc00034d680?, 0x0?}}})
2025-01-29T18:06:39.883665768+03:00     /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114 +0xb7
2025-01-29T18:06:39.883670720+03:00 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0001242c0, {0x2440588, 0xc0006ddef0}, {0x1e96420, 0xc00045f980})
2025-01-29T18:06:39.883674642+03:00     /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311 +0x3bc
2025-01-29T18:06:39.883682546+03:00 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0001242c0, {0x2440588, 0xc0006ddef0})
2025-01-29T18:06:39.883690454+03:00     /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261 +0x1be
2025-01-29T18:06:39.883695090+03:00 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
2025-01-29T18:06:39.883699069+03:00     /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222 +0x79
2025-01-29T18:06:39.883925257+03:00 created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 207
2025-01-29T18:06:39.883933278+03:00     /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:218 +0x486

What did you expect to happen: That CAPO would handle the absence of instance and change status of OpenStackMachine by changing a condition or requeueing.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster API Provider OpenStack version (Or git rev-parse HEAD if manually built): v0.11.3
  • Cluster-API version: v1.8.4
  • OpenStack version: Antelope 2023.1
  • Kubernetes version (use kubectl version): v1.30.2
  • OS (e.g. from /etc/os-release): rocky-9

71g3pf4c3 avatar Jan 29 '25 16:01 71g3pf4c3

This issue is valid and will be addressed within https://github.com/kubernetes-sigs/cluster-api-provider-openstack/issues/2379. However I suspect no backport will be possible.

EmilienM avatar Jan 29 '25 17:01 EmilienM

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Apr 29 '25 18:04 k8s-triage-robot

/remove-lifecycle stale

mnaser avatar Apr 29 '25 18:04 mnaser

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jul 28 '25 19:07 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Aug 27 '25 19:08 k8s-triage-robot

/remove-lifecycle stale

mnaser avatar Aug 28 '25 15:08 mnaser

/remove-lifecycle rotten

mnaser avatar Sep 03 '25 15:09 mnaser

this issue has been fixed already https://github.com/kubernetes-sigs/cluster-api-provider-openstack/pull/2475 https://github.com/kubernetes-sigs/cluster-api-provider-openstack/pull/2478

okozachenko1203 avatar Sep 04 '25 00:09 okozachenko1203

@71g3pf4c3 Could you please look into the latest comment and confirm if the issue is indeed fixed? Thanks.

bnallapeta avatar Sep 18 '25 06:09 bnallapeta

@71g3pf4c3 Could you please look into the latest comment and confirm if the issue is indeed fixed? Thanks.

Yup, everything is fine now. Thanks for providing support!

71g3pf4c3 avatar Sep 19 '25 07:09 71g3pf4c3