tink
tink copied to clipboard
Add ability to reboot the machine after workflow is finished
For workflows, which provision the OS, it would be nice if the workflow itself could reboot the machine, after it's done, so the machine can boot itself into target OS, so the upper orchestration system (e.g. person who monitors provisioning process, some kind of logic which use IPMI etc.) don't need to care about that.
Things to consider:
- worker can be part of multiple workflows. Perhaps reboot should only happen when all workflows are successfully finished.
- perhaps workflow could indicate, that after it's finished, the reboot is needed e.g. by setting
reboot
parameter totrue
. - the action or task can't trigger a reboot by itself, as this will shut down the worker and it won't be able to report that reboot task succeeded
it seems that we now have a documented way to do a reboot from an action at https://docs.tinkerbell.org/actions/action-architecture/#namespace:
When an action attempts to do these steps in a container in its own namespace, nothing will occur as PID 1 is usually the process in the action container. To allow the expected behaviour an action can use pid: host in its configuration, this will mean that the action processes will be amongst all of the processes on the host itself (including the "real" PID 1). With the action in the host process ID namespace both a reboot or kexec will be able to work as expected.
It this issue about improving on that?
This is fixed in tink-worker. This can probably be closed! 😀
@thebsdbox, by fixed, you mean using an action with pid: host
?
having a docs example on how to reboot from a workflow would also be really nice :-)
I found a reboot example at https://docs.tinkerbell.org/deploying-operating-systems/examples-win/#creating-a-reboot-action-dockerfile:
FROM busybox ENTRYPOINT [ "touch", "/worker/reboot" ]
is that it? we just need to create a new file named /worker/reboot
?
Creating a file named /worker/reboot
does not trigger a reboot from tink-worker:
Here's the workflow status:
+----------------------+--------------------------------------+
| FIELD NAME | VALUES |
+----------------------+--------------------------------------+
| Workflow ID | be378bb1-bdf9-11eb-9be0-0242ac120005 |
| Workflow Progress | 100% |
| Current Task | hello-world |
| Current Action | reboot |
| Current Worker | 00000000-0000-4000-8000-080027000001 |
| Current Action State | STATE_SUCCESS |
+----------------------+--------------------------------------+
+--------------------------------------+-------------+-------------+----------------+---------------------------------+---------------+
| WORKER ID | TASK NAME | ACTION NAME | EXECUTION TIME | MESSAGE | ACTION STATUS |
+--------------------------------------+-------------+-------------+----------------+---------------------------------+---------------+
| 00000000-0000-4000-8000-080027000001 | hello-world | reboot | 0 | Started execution | STATE_RUNNING |
| 00000000-0000-4000-8000-080027000001 | hello-world | reboot | 0 | finished execution successfully | STATE_SUCCESS |
+--------------------------------------+-------------+-------------+----------------+---------------------------------+---------------+
Ah this needs hook.. hook has the logic to watch for the reboot.
Can we use sysrq-r from an action? https://hub.docker.com/r/mlafeldt/sysrq/ for example.
the action or task can't trigger a reboot by itself, as this will shut down the worker and it won't be able to report that reboot task succeeded
Does the action need to be Tinkerbell specific and act as the worker to signal success?
Built a docker image as per the example @rgl mentioned here already to no avail:
The "touch" is going nowhere and thus the rebootWatch() never fires.
A manual touch in the getty container to "/run/worker/reboot" works, so the watch is active. Just looks the volume mapping is wrong? (/worker:/worker)
Edit: it works; just the workflow was hanging somehow. recreated that and works as advertised: -build docker image as in the windows example -tag+push to local registry -add the action as in the same example
profi...reboot :)
- name: "reboot into Windows"
image: reboot:latest
timeout: 90
volumes:
- /worker:/worker
I encountered the same issue in rebooting into Windows, the action failed (STATE_FAILED). Is there any place I can lookup for the error message?
- name: "reboot into Windows" image: reboot:latest timeout: 90 volumes: - /worker:/worker
I encountered the same issue in rebooting into Windows, the action failed (STATE_FAILED). Is there any place I can lookup for the error message?
It turns out the document is incorrect. I just sent out a PR to fix it.
We intend on drawing up a proposal for embedding restart capabilities into workflows so we don't need to rely on actions. This will compliment a want to see workflows consistently transition to an end state which doesn't happen if the restart beats the restart actions update currently.
https://github.com/tinkerbell/roadmap/issues/29 will see this come to fruition.