nixops-hetzner icon indicating copy to clipboard operation
nixops-hetzner copied to clipboard

Hetzner: nixops provides no way out when the machine is stuck in boot

Open nh2 opened this issue 8 years ago • 9 comments

Edit: See workaround.


I have a situation with nixops where I set up a Hetzner but specified a nonexistent device for a partition.

Now it's stuck in Starting / Obsolete.

I don't seem to be able to tell it to start anew from rescue mode.

Is there functionality in Hetzner to handle this case, without wiping the entire deployment? For example, I think just removing a given machine from the list would make it go through rescue mode on next deploy.


Also, I can't destroy in this case:

% ./ops destroy -d mydeployment
error: Multiple exceptions: please either set 'deployment.hetzner.robotUser' or $HETZNER_ROBOT_USER for machine 'machine-1', please either set 'deployment.hetzner.robotUser' or $HETZNER_ROBOT_USER for machine 'machine-2'

But I have set deployment.hetzner.robotUser.

nh2 avatar Jul 24 '17 21:07 nh2

CC @aszlig

nh2 avatar Jul 24 '17 21:07 nh2

As a workaround, I just dropped machine-1 and machine-2 from the nixops sqlite DB using sqlitebrowser (from the Resources table).

nh2 avatar Jul 24 '17 22:07 nh2

Workaround using sqlite3

nix-shell -p sqlite

(I recommend running rlwrap sqlite3 in the below so that arrow keys work.)

Deleting a specific machine

To delete (forget) mymachine from mydeployment:

sqlite3 localstate.nixops
DELETE FROM ResourceAttrs WHERE machine = (SELECT id FROM Resources WHERE name = 'mymachine' AND deployment = (SELECT deployment from DeploymentAttrs WHERE name = 'name' AND value = 'mydeployment'));
DELETE FROM Resources WHERE name = 'mymachine' AND deployment = (SELECT deployment from DeploymentAttrs WHERE name = 'name' AND value = 'mydeployment');

Deleting all attributes (including all machines!)

To delete everything in mydeployment:

sqlite3 localstate.nixops
DELETE FROM ResourceAttrs WHERE machine = (SELECT id FROM Resources WHERE deployment = (SELECT deployment from DeploymentAttrs WHERE name = 'name' AND value = 'mydeployment'));
DELETE FROM Resources WHERE deployment = (SELECT deployment from DeploymentAttrs WHERE name = 'name' AND value = 'mydeployment');

nh2 avatar Jul 24 '17 22:07 nh2

@nh2: Does it work with nixops destroy --include machine_you_want_to_destroy?

aszlig avatar Jul 24 '17 23:07 aszlig

@aszlig No, same error.

Also note I'm using an "admin user" account and I'm not even sure what destroy should do exactly for Hetzner.

But in any case, it seems we're not even getting to the point where that is relevant, as it seems to fail before that.

nh2 avatar Jul 24 '17 23:07 nh2

Ah, sorry... you did that in the first place. The reason this doesn't work is because it's using ROBOT_USER/ROBOT_PASS to access the robot (which apparently weren't passed), remove the vm_id from the server and reboot into rescue (with --wipe it also uses shred to erase the disks).

aszlig avatar Jul 25 '17 00:07 aszlig

@aszlig Hmm, from your answer I'm not sure if I can conclude it already: Should this work also with an admin user, or does that only work with the main Hetzner account?

nh2 avatar Jul 25 '17 01:07 nh2

@nh2 Still using your workaround?

coretemp avatar May 11 '18 18:05 coretemp

@coretemp Yes.

nh2 avatar May 11 '18 19:05 nh2