container-linux-update-operator
container-linux-update-operator copied to clipboard
MOTD is not updated
The MOTD on Container Linux claims that the reboot strategy is off. When CLUO is running, it should update the message to be accurate.
Dupe of https://github.com/coreos/bugs/issues/1968, I'll leave this here for tracking any work needed here.
I think what we need here is for the agent to indicate that it's managing updates.
I also think the indication should be fine-grained to the point of including:
- Running status for the agent
- Whether updates are paused/disabled
There are a couple options for this that make sense.
- Bindmounted file (with update paused/disabled status) + a flock-x
- Register a dbus handler that can be queried for update status
I'm going to propose going with the file approach since it's simpler.
The format can be a simple bash-sourcable KEY=value
file similar to update.conf
.
I'll suggest idly /var/lib/update-engine/coordinator.conf
with the only keys being NAME="K8s Update Operator"
and STATUS="running|paused"
, though maybe the latter is unneeded.
In addition, I'll suggest that it must keep a flock on the file as often as possible.
This may be obvious to everybody reading this except me, but the update strategy message is not part of MOTD. It is instead generated for every login shell as as part of profile.d
just by probing is-active locksmithd
. Script is at /usr/share/baselayout/coreos-profile.sh
.
~~Looking at node-agent deployment it seems to already have a bind-mounted /var/run/
.~~
@euank what about:
- dropping a file in
run
instead oflib
, to make it reflect status only for the current boot - tweak current baselayout profile.sh to
s/No Reboots/locksmithd - disabled/
- augment profile.sh to read k-v from the
coordinator.conf
and printUpdate Strategy: $NAME - $STATUS
defaulting to the labels above
Also, are we concerned about sourcing a file written from a container on every login shell (including root)? What is the recommended safe way in bash to read&print data without intermediate evaluation?
EDIT: I was looking at the wrong agent manifest.
@lucab I should have mentioned that's how it's updated, yes. My above comment was so that said profile.sh could read from it.
dropping a file in run instead of lib, to make it reflect status only for the current boot
That does sound better, good idea!
tweak current baselayout profile.sh to s/No Reboots/locksmithd - disabled/ augment profile.sh to read k-v from the coordinator.conf and print Update Strategy: $NAME - $STATUS defaulting to the labels above
That's close to what I was thinking, yeah. That's why I chose those two pieces of information.
Also, are we concerned about sourcing a file written from a container on every login shell (including root)? What is the recommended safe way in bash to read&print data without intermediate evaluation?
I'm not concerned about the performance. I thought about the security aspects a little, but since both run
and lib
would be only writeable by root I don't think it's so scary. Anyone that can write files as root at an arbitrary host location can already achieve root/DoS fairly trivially, so adding another path from "write to file as root -> code runs" doesn't seem like it introduces any new security issues.
If we really want to avoid sourcing it, we could suitably constrain it to be parseable by awk/sed.
Did you have anything particular in mind?
If we really want to avoid sourcing it, we could suitably constrain it to be parseable by awk/sed. Did you have anything particular in mind?
No, I like you plan, +1 on that. I was just wondering about having a real data-only .conf
instead of a potential script to source and I wanted to learn about best-practices (my bash-fu is weak).
@dghubble if you're interested in tackling the CLUO side of things, the changes made to locksmithctl/daemon.go
here are pretty close to what cluo will need as well, I'd guess.
Sprinkling UpdateState
calls in the appropriate locations should be enough to resolve this issue on Container Linux versions of at least 1520. There's no harm in this on older container linux instances, so rolling it out doesn't need to gate on the CL changes hitting stable.