container-linux-update-operator icon indicating copy to clipboard operation
container-linux-update-operator copied to clipboard

Distro support

Open kfox1111 opened this issue 7 years ago • 5 comments

Are there any plans to update this operator to work with RHEL/CentOS? Conceptionally there doesn't seem much CoreOS specific about it. Perhaps it works already?

kfox1111 avatar Oct 15 '18 18:10 kfox1111

CLUO does end up being pretty Container Linux specific, particularly in the way that it ties into update_engine to poll for updates. Since CLUO doesn't have any real control over the underlying update process, it's really just locksmith running as a daemonset in kubernetes. In general, we are preferring newer tools that have much more direct control over the update process, such as the machine config daemon, which ties into rpm-ostree directly to update the operating system. That one is specifically for Red Hat CoreOS right now.

There was some early exploratory work that integrated this codebase directly with rpm-ostree (https://github.com/ashcrow/container-linux-update-operator/tree/spike) but the focus has been on the MCD system. As far as I know, there is no equivalent tool that integrates with dnf or any other package management systems.

sdemos avatar Oct 15 '18 20:10 sdemos

What about a yum plugin that called 'locksmithctl send-need-reboot' on any change? It may reboot more then needed, but could work? Alternately, could you just buypass the locksmith and label the node directly? would the rest of the reboot logic work in that case?

kfox1111 avatar Oct 15 '18 20:10 kfox1111

Sorry for the confusion. I meant that it is architecturally and behaviorally like locksmith, not that it is literally locksmith. The CLUO agent hooks directly into update_engine through it's exposed DBUS API (https://github.com/coreos/container-linux-update-operator/blob/4bb1486f482bc9c365c71e126129e806b5a0fc97/pkg/updateengine/client.go#L61) and whenever update_engine applies a new update (entirely out of band, like on any container linux instance), the reboot coordinator component confirms that only one gets rebooted at a time. The reboot logic might work, but again, there is nothing in CLUO that actually triggers an update, and it's not architected to do that.

sdemos avatar Oct 15 '18 21:10 sdemos

I think we always intended Fedora/RHEL would be designed quite differently, as a different reboot coordinator app.

dghubble avatar Oct 15 '18 21:10 dghubble

Do you see the logic around picking nodes, draining, rebooting, and uncordoning as being distro specific? I could see the node agent being specific. Does the reboot manager pay attention to any other state then needs upgrading?

I was thinking of trying to set up ansible to point yum at the new version repo (we version mirror snapshots), yum upgrade, and trigger the locksmith and let the operator reboot things safely. cicd would trigger ansible to upgrade the nodes and the operator would reboot them as needed safely? Alternately, it could maybe skip locksmith entirely and just set node labels directly?

kfox1111 avatar Oct 16 '18 00:10 kfox1111