longhorn icon indicating copy to clipboard operation
longhorn copied to clipboard

[BUG] Uninstallation fail due to deleting the default engine image is not allowed

Open chriscchien opened this issue 5 months ago • 4 comments

Describe the Bug

Uninstallation failed due to deleting the default engine image is not allowed

Uninstall pod log

time="2025-06-17T08:20:41Z" level=info msg="Found 1 engineimages remaining" func="controller.(*UninstallController).deleteCRs" file="uninstall_controller.go:569" controller=longhorn-uninstall
time="2025-06-17T08:20:41Z" level=warning msg="Failed to uninstall" func="controller.(*UninstallController).handleErr" file="uninstall_controller.go:294" controller=longhorn-uninstall error="failed to delete engine images: failed to mark for deletion: admission webhook \"validator.longhorn.io\" denied the request: deleting the default engine image longhornio/longhorn-engine:master-head (longhornio/longhorn-engine:master-head) is not allowed"

To Reproduce

  1. Deploy Longhorn master(v1.10 dev)
  2. Set setting deleting-confirmation-flag to true
  3. Uninstall Longhorn by kubectl create -f https://raw.githubusercontent.com/longhorn/longhorn/master/uninstall/uninstall.yaml
  4. Job longhorn-uninstall stuck

Expected Behavior

longhorn-uninstall completed.

Support Bundle for Troubleshooting

supportbundle_76090e17-70a6-40d3-bbf2-8a898cad65c4_2025-06-17T08-24-18Z.zip

Environment

  • Longhorn version: master-head(v1.10 dev)

Additional context

N/A

Workaround and Mitigation

N/A

chriscchien avatar Jun 17 '25 08:06 chriscchien

Maybe we could add some annotation on engine image CR for uninstalling, just like what we did on backup target CR: https://github.com/longhorn/longhorn-manager/blob/a56819105fd965b73dbc70133e1eb5814d3bea91/controller/uninstall_controller.go#L811-L812

COLDTURNIP avatar Jun 17 '25 08:06 COLDTURNIP

Pre Ready-For-Testing Checklist

  • [x] Where is the reproduce steps/test steps documented? The reproduce steps/test steps are at: ticket description (similar to #11131, can be verified together)

  • [x] Does the PR include the explanation for the fix or the feature?

  • [x] Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)? The PR is at longhorn/longhorn-manager#3860

longhorn-io-github-bot avatar Jun 17 '25 10:06 longhorn-io-github-bot

And there's a similar problem on LHN.

longhorn-manager-px25m longhorn-manager time="2025-06-17T10:01:49Z" level=warning msg="Rejected operation: Request (user: system:serviceaccount:longhorn-system:longhorn-uninstall-service-account, longhorn.io/v1bet
a2, Kind=Node, namespace: longhorn-system, name: libvirt-ubuntu-k3s-worker1, operation: DELETE)" func="admission.(*Handler).admit" file="admission.go:106" error="could not delete node libvirt-ubuntu-k3s-worker1 wi
th node ready condition is True, reason is , node schedulable true, and 0 replica, 0 engine running on it" service=admissionWebhook

Filed another ticket: #11131

COLDTURNIP avatar Jun 17 '25 10:06 COLDTURNIP

Solution validated with patched longhorn manager (longhorn/longhorn-manager#3860 & longhorn/longhorn-manager#3861): to a cluster include an attached volume using a non-default engine image, the uninstallation job complete without any problem.

COLDTURNIP avatar Jun 19 '25 03:06 COLDTURNIP

Verified pass on longhorn master(https://github.com/longhorn/longhorn-manager/commit/c36b3d5e1cad9007d32a9072173b7d68206907a5) by perform Uninstallation Checks passed and master head daily regression passed

chriscchien avatar Jun 24 '25 06:06 chriscchien