eve icon indicating copy to clipboard operation
eve copied to clipboard

[WIP][DO NOT MERGE]Moving containerd content and snapshots into vault

Open rvs opened this issue 5 years ago • 7 comments

As it turns out, we've neglected one bit during our CAS refactoring: the fact that containerd itself stores content (CAS blobs) and snapshots in the unencrypted location under /persist/containerd.

This is somewhat scary, but the good news is that at least VM images are unaffected (btw, why the heck do we NOT treat them as CAS?)

Now, fixing it is actually more tricky that I initially thought. The problem being that all the plugins of containerd tend to initialize their backing stores right away when containerd starts. This is before pillar's vaultmgr has a chance to unlock the vault with the required key. Hence a knee-jerk attempt at configuring containerd to store all its data in vault won't work (at least not until we split vaultmgr into something that may need to run before containerd -- which is a conversation @eriknordmark and @cshari-zededa need to have).

Hence, this PR attempts to pull off a few hacks that would allow containerd to come up but perhaps not have actual access to the bits stored as content (CAS) and snapshots. This should be fine (in theory!) since access to them is required only later during the domainmgr lifecycle.

So... this is very lightly tested, but if we all agree that this is the best we can do for now -- so be it and I can test it then.

One other bit of help I need from @deitch is figuring out why setting plugins.content.root in the containerd/config.toml doesn't seem to affect the setting on that plugin. Running ctr plugins ls -d suggested it should be tweakable.

Oh, and finally, I cleaned up zfs filenames a bit -- since /run is universally available everywhere. @cshari-zededa -- please take a look.

rvs avatar Aug 02 '20 05:08 rvs

This is somewhat scary, but the good news is that at least VM images are unaffected (btw, why the heck do we NOT treat them as CAS?)

We are, or we are about to. I believe @adarsh-zededa had a PR ready on that, but correct me?

deitch avatar Aug 03 '20 11:08 deitch

Hence a knee-jerk attempt at configuring containerd to store all its data in vault won't work (at least not until we split vaultmgr into something that may need to run before containerd -- which is a conversation @eriknordmark and @cshari-zededa need to have).

containerd is a base system tool. It is started as part of init, and installed here. Essentially, that installs init (the init, that the kernel starts on boot), which itself launches containerd.

We only call our storage-init here, as part of onboot, which is called by init (almost: init calls runc to run the onboot, but close enough), which also calls containerd.

If we really want all of this encrypted, then this needs to be an earlier stage, an init type stage. By the time we get to onboot, let alone services, it is assumed that base system stuff (like containerd) are up and running.

We could discuss an even earlier stage, but in truth, that is what an init managers (like rc.init) are for.

Let me turn it around. Do we have a clear documentation of what vaultmgr does, what its states are, when it is used/needed? I suspect that it needs to be part of init, but I also suspect that it performs multiple tasks, some of which fit into different stages, and thus may need refactoring, or may not.

Somewhat separately, if we had a clear, good design for filesystem encryption - or better yet, an interface to it with support for different actual implementations - we could make it a standard part of linuxkit, either as part of init or even at a higher level.

deitch avatar Aug 03 '20 11:08 deitch

One other bit of help I need from @deitch is figuring out why setting plugins.content.root in the containerd/config.toml doesn't seem to affect the setting on that plugin. Running ctr plugins ls -d suggested it should be tweakable

That is strange. I am curious why we do not just set the high-level root to be under the vault - or even leave it where it is but symlink it to somewhere in the vault - and then everything else will just work. We probably wouldn't even need to do some of the changes in this PR.

deitch avatar Aug 03 '20 11:08 deitch

Let me turn it around. Do we have a clear documentation of what vaultmgr does, what its states are, when it is used/needed? I suspect that it needs to be part of init, but I also suspect that it performs multiple tasks, some of which fit into different stages, and thus may need refactoring, or may not.

Somewhat separately, if we had a clear, good design for filesystem encryption - or better yet, an interface to it with support for different actual implementations - we could make it a standard part of linuxkit, either as part of init or even at a higher level.

@deitch vaultmgr and in general data encryption at rest on EVE are discussed here: https://github.com/lf-edge/eve/blob/master/pkg/pillar/docs/vaultmgr.md https://wiki.lfedge.org/display/EVE/Encrypting+Sensitive+Information+at+Rest+at+the+Edge

vaultmgr has an interaction with "Measured Boot and Remote Attestation" functionality, where the vault the vault keys are "escrowed" with the Controller, and shared with the device only after successful attestation. Therefore there will be a window where the vault is locked, and waiting for the keys, until zedagent comes up and completes attestation cycle. This is discussed here: https://wiki.lfedge.org/display/EVE/Measured+Boot+and+Remote+Attestation#MeasuredBootandRemoteAttestation-ModuleLevelInteraction-EVEStartupSequence(Rebootwithachange)

cshari-zededa avatar Aug 03 '20 14:08 cshari-zededa

until zedagent comes up and completes attestation cycle. This is discussed here:

OK, I remember that now; thanks for bringing it back up. And now we have the chicken and egg problem. containerd needs the vault unlocked before it starts, because it should run on a filesystem in the vault, but unlocking requires zedagent and vaultmgt, which run as containers, which are managed by containerd, which...

An alternative approach might be that we distinguish between user containers in containerd, which are in the vault, and system containers, which are not.

Unfortunately, it isn't at all clear to me how we can do that.

deitch avatar Aug 03 '20 15:08 deitch

Hey @deitch -- regardless of everything else -- any ideas on why:

[plugins]
  [plugins.content]
    root = "/var/persist/vault/content"

didn't take?

rvs avatar Aug 04 '20 00:08 rvs

An alternative approach might be that we distinguish between user containers in containerd, which are in the vault, and system containers, which are not.

Another approach may be to experiment with actually re-starting containerd when we unlock the vault. Theoretically restarting containerd should be a safe operation for all the things that are already running.

Regardless, @deitch it seems that we need to have a bit of a design thinking around this one. I filed a tracker story for you ;-)

And finally, here's one more update on this: the proposed hack didn't quite work after all. I'm not really sure what part of containerd gets unhappy about this type of swap but something does. Hence I'm going to close this PR and open a much smaller subset of it. The rest we will handle through more of a thorough design approach.

rvs avatar Aug 04 '20 03:08 rvs

[the hostess took up the broom] @rvs do you plan to return to this or we can close it?

rouming avatar Oct 12 '22 12:10 rouming

[the hostess took up the broom] @rvs do you plan to return to this or we can close it?

The problem was resolved in another PR: https://github.com/lf-edge/eve/pull/2524

giggsoff avatar Oct 12 '22 14:10 giggsoff

[the hostess took up the broom] @rvs do you plan to return to this or we can close it?

The problem was resolved in another PR: #2524

@giggsoff you mean the problem was resolved and now this pr can be merged or at least work on this pr can be continued?

rouming avatar Oct 12 '22 15:10 rouming

I mean the PR may be closed

giggsoff avatar Oct 12 '22 15:10 giggsoff

I mean the PR may be closed

all clear, thanks.

rouming avatar Oct 12 '22 15:10 rouming