cluster-api-provider-openstack
cluster-api-provider-openstack copied to clipboard
Panic if instance was deleted in openstack manually
/kind bug
What steps did you take and what happened: In any setup create machinedeployment, wait machines and openstackmachines to be ready state and delete instance inside openstack project (via cli or horizon ui). CAPO controller crashes with nil pointer dereference.
Traced it to: openstackmachine_controller.go reconcileNormal()
var instanceStatus *compute.InstanceStatus
if instanceStatus, err = computeService.GetInstanceStatus(*machineServer.Status.InstanceID); err != nil {
return ctrl.Result{}, err
}
instanceNS, err := instanceStatus.NetworkStatus() // <- here we got nil pointer dereference
Which leads to
instance.go
func (s *Service) GetInstanceStatus(resourceID string) (instance *InstanceStatus, err error) {
if resourceID == "" {
return nil, fmt.Errorf("resourceId should be specified to get detail")
}
server, err := s.getComputeClient().GetServer(resourceID)
if err != nil {
if capoerrors.IsNotFound(err) {
return nil, nil // <- returns nil,nil. Which leads to instanceStatus.NetworkStatus() produce error
}
return nil, fmt.Errorf("get server %q detail failed: %v", resourceID, err)
}
return &InstanceStatus{server, s.scope.Logger()}, nil
}
Logs:
2025-01-29T18:06:39.883385751+03:00 panic: runtime error: invalid memory address or nil pointer dereference [recovered]
2025-01-29T18:06:39.883477120+03:00 panic: runtime error: invalid memory address or nil pointer dereference
2025-01-29T18:06:39.883490834+03:00 [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1b8f83a]
2025-01-29T18:06:39.883499159+03:00
2025-01-29T18:06:39.883505997+03:00 goroutine 350 [running]:
2025-01-29T18:06:39.883516576+03:00 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
2025-01-29T18:06:39.883523684+03:00 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:111 +0x1e5
2025-01-29T18:06:39.883570207+03:00 panic({0x1dccbe0?, 0x362a670?})
2025-01-29T18:06:39.883599681+03:00 /usr/local/go/src/runtime/panic.go:770 +0x132
2025-01-29T18:06:39.883625108+03:00 sigs.k8s.io/cluster-api-provider-openstack/pkg/cloud/services/compute.(*InstanceStatus).NetworkStatus(0x0)
2025-01-29T18:06:39.883630041+03:00 /workspace/pkg/cloud/services/compute/instance_types.go:138 +0x3a
2025-01-29T18:06:39.883634741+03:00 sigs.k8s.io/cluster-api-provider-openstack/controllers.(*OpenStackMachineReconciler).reconcileNormal(0xc000482300, {0x2440550, 0xc000a27860}, 0xc000a530b0, {0xc0002c4d08, 0x18}, 0xc000a04b08, 0xc000a86f08, 0xc000a86a08)
2025-01-29T18:06:39.883638961+03:00 /workspace/controllers/openstackmachine_controller.go:380 +0x1fc
2025-01-29T18:06:39.883643634+03:00 sigs.k8s.io/cluster-api-provider-openstack/controllers.(*OpenStackMachineReconciler).Reconcile(0xc000482300, {0x2440550, 0xc000a27860}, {{{0xc00053f710?, 0x0?}, {0xc00034d680?, 0xc000909d10?}}})
2025-01-29T18:06:39.883647757+03:00 /workspace/controllers/openstackmachine_controller.go:161 +0xbd8
2025-01-29T18:06:39.883661273+03:00 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x24467c8?, {0x2440550?, 0xc000a27860?}, {{{0xc00053f710?, 0xb?}, {0xc00034d680?, 0x0?}}})
2025-01-29T18:06:39.883665768+03:00 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114 +0xb7
2025-01-29T18:06:39.883670720+03:00 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0001242c0, {0x2440588, 0xc0006ddef0}, {0x1e96420, 0xc00045f980})
2025-01-29T18:06:39.883674642+03:00 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311 +0x3bc
2025-01-29T18:06:39.883682546+03:00 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0001242c0, {0x2440588, 0xc0006ddef0})
2025-01-29T18:06:39.883690454+03:00 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:261 +0x1be
2025-01-29T18:06:39.883695090+03:00 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
2025-01-29T18:06:39.883699069+03:00 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222 +0x79
2025-01-29T18:06:39.883925257+03:00 created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 207
2025-01-29T18:06:39.883933278+03:00 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:218 +0x486
What did you expect to happen: That CAPO would handle the absence of instance and change status of OpenStackMachine by changing a condition or requeueing.
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
Environment:
- Cluster API Provider OpenStack version (Or
git rev-parse HEADif manually built): v0.11.3 - Cluster-API version: v1.8.4
- OpenStack version: Antelope 2023.1
- Kubernetes version (use
kubectl version): v1.30.2 - OS (e.g. from
/etc/os-release): rocky-9
This issue is valid and will be addressed within https://github.com/kubernetes-sigs/cluster-api-provider-openstack/issues/2379. However I suspect no backport will be possible.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle stale
/remove-lifecycle rotten
this issue has been fixed already https://github.com/kubernetes-sigs/cluster-api-provider-openstack/pull/2475 https://github.com/kubernetes-sigs/cluster-api-provider-openstack/pull/2478
@71g3pf4c3 Could you please look into the latest comment and confirm if the issue is indeed fixed? Thanks.
@71g3pf4c3 Could you please look into the latest comment and confirm if the issue is indeed fixed? Thanks.
Yup, everything is fine now. Thanks for providing support!