talos icon indicating copy to clipboard operation
talos copied to clipboard

Talosctl reset hanging after checks are completed

Open ChickenIQ opened this issue 1 year ago • 4 comments

Bug Report

Description

Talosctl may hang after completing the necessary checks after resetting the nodes. This is inconsistent, but I managed to replicate it a handful of times during my testing. Manually canceling results in the desired outcome.

Command used: talosctl reset --reboot --graceful=false --system-labels-to-wipe EPHEMERAL

Logs

◲ watching nodes: [192.168.1.100 192.168.1.99] * 192.168.1.100: service: kubelet message: Health check successful healthy: true * 192.168.1.99: post check passed

Environment

  • Talos version: Client: Tag: v1.7.2 SHA: f876025b Built: Go version: go1.22.3 OS/Arch: linux/amd64 Server: NODE: 192.168.1.99 Tag: v1.7.2 SHA: f876025b Built: Go version: go1.22.3 OS/Arch: linux/amd64 Enabled: RBAC
  • Kubernetes version: Client Version: v1.30.1 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.30.1
  • Platform: metal

ChickenIQ avatar May 25 '24 20:05 ChickenIQ

What are the endpoinds (talosctl config info)?

smira avatar May 27 '24 10:05 smira

What are the endpoinds (talosctl config info)?

I'm currently unable to reach my computer, but they are 192.168.1.100 and 192.168.1.99, that are both are controlplanes.

Could this be because this is not a highly available configuration?

ChickenIQ avatar May 27 '24 10:05 ChickenIQ

You can't reset both controlplanes, this doesn't make any sense in general, as etcd data will be lost.

I think the issue here is probably still valid.

smira avatar May 27 '24 10:05 smira

You can't reset both controlplanes, this doesn't make any sense in general, as etcd data will be lost.

I think the issue here is probably still valid.

My goal is to fully reset the nodes without them going into maintenance mode, so I can keep my secrets valid, then I regenerate them, use the old config to apply them then bootstrap the cluster again, resulting in a clean slate, all automated.

Data loss is not a problem, it is the desired outcome.

ChickenIQ avatar May 27 '24 11:05 ChickenIQ

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Feb 14 '25 02:02 github-actions[bot]

This issue was closed because it has been stalled for 7 days with no activity.

github-actions[bot] avatar Feb 19 '25 02:02 github-actions[bot]