eve
eve copied to clipboard
[WIP][DO NOT MERGE]Moving containerd content and snapshots into vault
As it turns out, we've neglected one bit during our CAS refactoring: the fact that containerd itself stores content (CAS blobs) and snapshots in the unencrypted location under /persist/containerd.
This is somewhat scary, but the good news is that at least VM images are unaffected (btw, why the heck do we NOT treat them as CAS?)
Now, fixing it is actually more tricky that I initially thought. The problem being that all the plugins of containerd tend to initialize their backing stores right away when containerd starts. This is before pillar's vaultmgr has a chance to unlock the vault with the required key. Hence a knee-jerk attempt at configuring containerd to store all its data in vault won't work (at least not until we split vaultmgr into something that may need to run before containerd -- which is a conversation @eriknordmark and @cshari-zededa need to have).
Hence, this PR attempts to pull off a few hacks that would allow containerd to come up but perhaps not have actual access to the bits stored as content (CAS) and snapshots. This should be fine (in theory!) since access to them is required only later during the domainmgr lifecycle.
So... this is very lightly tested, but if we all agree that this is the best we can do for now -- so be it and I can test it then.
One other bit of help I need from @deitch is figuring out why setting plugins.content.root in the containerd/config.toml doesn't seem to affect the setting on that plugin. Running ctr plugins ls -d suggested it should be tweakable.
Oh, and finally, I cleaned up zfs filenames a bit -- since /run is universally available everywhere. @cshari-zededa -- please take a look.
This is somewhat scary, but the good news is that at least VM images are unaffected (btw, why the heck do we NOT treat them as CAS?)
We are, or we are about to. I believe @adarsh-zededa had a PR ready on that, but correct me?
Hence a knee-jerk attempt at configuring containerd to store all its data in vault won't work (at least not until we split vaultmgr into something that may need to run before containerd -- which is a conversation @eriknordmark and @cshari-zededa need to have).
containerd is a base system tool. It is started as part of init, and installed here. Essentially, that installs init (the init, that the kernel starts on boot), which itself launches containerd.
We only call our storage-init here, as part of onboot, which is called by init (almost: init calls runc to run the onboot, but close enough), which also calls containerd.
If we really want all of this encrypted, then this needs to be an earlier stage, an init type stage. By the time we get to onboot, let alone services, it is assumed that base system stuff (like containerd) are up and running.
We could discuss an even earlier stage, but in truth, that is what an init managers (like rc.init) are for.
Let me turn it around. Do we have a clear documentation of what vaultmgr does, what its states are, when it is used/needed? I suspect that it needs to be part of init, but I also suspect that it performs multiple tasks, some of which fit into different stages, and thus may need refactoring, or may not.
Somewhat separately, if we had a clear, good design for filesystem encryption - or better yet, an interface to it with support for different actual implementations - we could make it a standard part of linuxkit, either as part of init or even at a higher level.
One other bit of help I need from @deitch is figuring out why setting plugins.content.root in the containerd/config.toml doesn't seem to affect the setting on that plugin. Running ctr plugins ls -d suggested it should be tweakable
That is strange. I am curious why we do not just set the high-level root to be under the vault - or even leave it where it is but symlink it to somewhere in the vault - and then everything else will just work. We probably wouldn't even need to do some of the changes in this PR.
Let me turn it around. Do we have a clear documentation of what vaultmgr does, what its states are, when it is used/needed? I suspect that it needs to be part of init, but I also suspect that it performs multiple tasks, some of which fit into different stages, and thus may need refactoring, or may not.
Somewhat separately, if we had a clear, good design for filesystem encryption - or better yet, an interface to it with support for different actual implementations - we could make it a standard part of linuxkit, either as part of
initor even at a higher level.
@deitch vaultmgr and in general data encryption at rest on EVE are discussed here: https://github.com/lf-edge/eve/blob/master/pkg/pillar/docs/vaultmgr.md https://wiki.lfedge.org/display/EVE/Encrypting+Sensitive+Information+at+Rest+at+the+Edge
vaultmgr has an interaction with "Measured Boot and Remote Attestation" functionality, where the vault the vault keys are "escrowed" with the Controller, and shared with the device only after successful attestation. Therefore there will be a window where the vault is locked, and waiting for the keys, until zedagent comes up and completes attestation cycle. This is discussed here: https://wiki.lfedge.org/display/EVE/Measured+Boot+and+Remote+Attestation#MeasuredBootandRemoteAttestation-ModuleLevelInteraction-EVEStartupSequence(Rebootwithachange)
until zedagent comes up and completes attestation cycle. This is discussed here:
OK, I remember that now; thanks for bringing it back up. And now we have the chicken and egg problem. containerd needs the vault unlocked before it starts, because it should run on a filesystem in the vault, but unlocking requires zedagent and vaultmgt, which run as containers, which are managed by containerd, which...
An alternative approach might be that we distinguish between user containers in containerd, which are in the vault, and system containers, which are not.
Unfortunately, it isn't at all clear to me how we can do that.
Hey @deitch -- regardless of everything else -- any ideas on why:
[plugins]
[plugins.content]
root = "/var/persist/vault/content"
didn't take?
An alternative approach might be that we distinguish between user containers in containerd, which are in the vault, and system containers, which are not.
Another approach may be to experiment with actually re-starting containerd when we unlock the vault. Theoretically restarting containerd should be a safe operation for all the things that are already running.
Regardless, @deitch it seems that we need to have a bit of a design thinking around this one. I filed a tracker story for you ;-)
And finally, here's one more update on this: the proposed hack didn't quite work after all. I'm not really sure what part of containerd gets unhappy about this type of swap but something does. Hence I'm going to close this PR and open a much smaller subset of it. The rest we will handle through more of a thorough design approach.
[the hostess took up the broom] @rvs do you plan to return to this or we can close it?
[the hostess took up the broom] @rvs do you plan to return to this or we can close it?
The problem was resolved in another PR: https://github.com/lf-edge/eve/pull/2524
[the hostess took up the broom] @rvs do you plan to return to this or we can close it?
The problem was resolved in another PR: #2524
@giggsoff you mean the problem was resolved and now this pr can be merged or at least work on this pr can be continued?
I mean the PR may be closed
I mean the PR may be closed
all clear, thanks.