cluster-api-provider-openstack Panic if instance was deleted in openstack manually

/kind bug

What steps did you take and what happened: In any setup create machinedeployment, wait machines and openstackmachines to be ready state and delete instance inside openstack project (via cli or horizon ui). CAPO controller crashes with nil pointer dereference.

Traced it to: openstackmachine_controller.go reconcileNormal()

	var instanceStatus *compute.InstanceStatus
	if instanceStatus, err = computeService.GetInstanceStatus(*machineServer.Status.InstanceID); err != nil {
		return ctrl.Result{}, err
	}

	instanceNS, err := instanceStatus.NetworkStatus()  // <- here we got nil pointer dereference

Which leads to

instance.go

func (s *Service) GetInstanceStatus(resourceID string) (instance *InstanceStatus, err error) {
	if resourceID == "" {
		return nil, fmt.Errorf("resourceId should be specified to get detail")
	}

	server, err := s.getComputeClient().GetServer(resourceID)
	if err != nil {
		if capoerrors.IsNotFound(err) {
			return nil, nil     //  <- returns nil,nil. Which leads to instanceStatus.NetworkStatus() produce error

		}
		return nil, fmt.Errorf("get server %q detail failed: %v", resourceID, err)
	}

	return &InstanceStatus{server, s.scope.Logger()}, nil
}

Logs:

2025-01-29T18:06:39.883385751+03:00 panic: runtime error: invalid memory address or nil pointer dereference [recovered]
2025-01-29T18:06:39.883477120+03:00     panic: runtime error: invalid memory address or nil pointer dereference
2025-01-29T18:06:39.883490834+03:00 [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1b8f83a]
2025-01-29T18:06:39.883499159+03:00
2025-01-29T18:06:39.883505997+03:00 goroutine 350 [running]:
2025-01-29T18:06:39.883516576+03:00 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
2025-01-29T18:06:39.883523684+03:00     /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:111 +0x1e5
2025-01-29T18:06:39.883570207+03:00 panic({0x1dccbe0?, 0x362a670?})
2025-01-29T18:06:39.883599681+03:00     /usr/local/go/src/runtime/panic.go:770 +0x132
2025-01-29T18:06:39.883625108+03:00 sigs.k8s.io/cluster-api-provider-openstack/pkg/cloud/services/compute.(*InstanceStatus).NetworkStatus(0x0)
2025-01-29T18:06:39.883630041+03:00     /workspace/pkg/cloud/services/compute/instance_types.go:138 +0x3a
2025-01-29T18:06:39.883634741+03:00 sigs.k8s.io/cluster-api-provider-openstack/controllers.(*OpenStackMachineReconciler).reconcileNormal(0xc000482300, {0x2440550, 0xc000a27860}, 0xc000a530b0, {0xc0002c4d08, 0x18}, 0xc000a04b08, 0xc000a86f08, 0xc000a86a08)
2025-01-29T18:06:39.883638961+03:00     /workspace/controllers/openstackmachine_controller.go:380 +0x1fc
2025-01-29T18:06:39.883643634+03:00 sigs.k8s.io/cluster-api-provider-openstack/controllers.(*OpenStackMachineReconciler).Reconcile(0xc000482300, {0x2440550, 0xc000a27860}, {{{0xc00053f710?, 0x0?}, {0xc00034d680?, 0xc000909d10?}}})
2025-01-29T18:06:39.883647757+03:00     /workspace/controllers/openstackmachine_controller.go:161 +0xbd8
2025-01-29T18:06:39.883661273+03:00 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x24467c8?, {0x2440550?, 0xc000a27860?}, {{{0xc00053f710?, 0xb?}, {0xc00034d680?, 0x0?}}})
2025-01-29T18:06:39.883665768+03:00     /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114 +0xb7
2025-01-29T18:06:39.883670720+03:00 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0001242c0, {0x2440588, 0xc0006ddef0}, {0x1e96420, 0xc00045f980})
2025-01-29T18:06:39.883674642+03:00     /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311 +0x3bc
2025-01-29T18:06:39.883682546+03:00 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0001242c0, {0x2440588, 0xc0006ddef0})
2025-01-29T18:06:39.883690454+03:00     /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261 +0x1be
2025-01-29T18:06:39.883695090+03:00 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
2025-01-29T18:06:39.883699069+03:00     /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222 +0x79
2025-01-29T18:06:39.883925257+03:00 created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 207
2025-01-29T18:06:39.883933278+03:00     /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:218 +0x486

What did you expect to happen: That CAPO would handle the absence of instance and change status of OpenStackMachine by changing a condition or requeueing.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

Cluster API Provider OpenStack version (Or git rev-parse HEAD if manually built): v0.11.3
Cluster-API version: v1.8.4
OpenStack version: Antelope 2023.1
Kubernetes version (use kubectl version): v1.30.2
OS (e.g. from /etc/os-release): rocky-9

Jan 29 '25 16:01 71g3pf4c3

This issue is valid and will be addressed within https://github.com/kubernetes-sigs/cluster-api-provider-openstack/issues/2379. However I suspect no backport will be possible.

Jan 29 '25 17:01 EmilienM

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Apr 29 '25 18:04 k8s-triage-robot

/remove-lifecycle stale

Apr 29 '25 18:04 mnaser

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jul 28 '25 19:07 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Aug 27 '25 19:08 k8s-triage-robot

/remove-lifecycle stale

Aug 28 '25 15:08 mnaser

/remove-lifecycle rotten

Sep 03 '25 15:09 mnaser

this issue has been fixed already https://github.com/kubernetes-sigs/cluster-api-provider-openstack/pull/2475 https://github.com/kubernetes-sigs/cluster-api-provider-openstack/pull/2478

Sep 04 '25 00:09 okozachenko1203

@71g3pf4c3 Could you please look into the latest comment and confirm if the issue is indeed fixed? Thanks.

Sep 18 '25 06:09 bnallapeta

@71g3pf4c3 Could you please look into the latest comment and confirm if the issue is indeed fixed? Thanks.

Yup, everything is fine now. Thanks for providing support!

Sep 19 '25 07:09 71g3pf4c3

cluster-api-provider-openstack cluster-api-provider-openstack copied to clipboard

Panic if instance was deleted in openstack manually

cluster-api-provider-openstack
cluster-api-provider-openstack copied to clipboard