mantle
mantle copied to clipboard
start/stop a machine feature request
@yifan-gu Had a test he wanted to write for pluton where:
1, worker is down
2, delete pod and checkpointer on master
3, worker comes back, still running pod and checkpointer
4, master comes back, work now see no pod and checkpointer being scheduled, so checkpointer cleans up everything
In this case it'd nice if we could issue a machine.Stop()
and a machine.Start()
command instead of a reboot. Not sure if this is a feature we should bake into the platform package or not. I think we can work around this for now by issueing machine.SSH("sudo systemctl mask kubelet.sevice")
and machine.Reboot()
for our stop command. Then run machine.SSH("sudo systemctl enable --now kubelet.service")
for our start command.
Another data-point here is that I think upstream will use iptables to blackhole certain nodes for destructive tests. Seems like that could be a separate feature that could be implemented in a fully platform independent way. Reboot()
is a platform specific implementation right now I believe.
cc @marineam
@pbx0 Thanks for filing the issue. For now I can try to just stop/start kubelet + machine reboot to simulate this. Will report back how that works.
I've also been wanting a machine.Start()
to provide a more flexible machine setup by being able to call methods to configure networking and so on before booting the machine for the first time.
I just want to note that although start/stop will be easy to implement on gce and aws the qemu code will need some reworking in order to do that. So for now please stick with finding alternative ways to implement such tests as I shouldn't get sucked into redoing that code just yet.
Definitely not a big priority for us. Just documenting.