swtpm icon indicating copy to clipboard operation
swtpm copied to clipboard

Live migrations fail when using shared storage

Open jean-edouard opened this issue 3 years ago • 13 comments

Describe the bug When a shared drive is mounted under the location where swtpm stores its state, live migrations will fail if that drive is mounted on both the source and the target. I assume this is because of the lock that swtpm takes on those files. We need a way for the source swtpm to release the lock to let the target swtpm take over during a live migration.

To Reproduce Steps to reproduce the behavior:

  1. Start a root/system instance of libvirtd
  2. Create and mount a shared storage volume at /var/lib/libvirt/swtpm
  3. Create a VM with an emulated TPM device
  4. Mount the shared storage on every potential migration target and trigger a migration

Expected behavior The live migration succeeds.

Desktop (please complete the following information):

  • OS: Centos Stream
  • Version: 8

Versions of relevant components

  • swtpm: 0.7.0
  • libvirt: 8.0.0

jean-edouard avatar Aug 03 '22 13:08 jean-edouard

I can reproduce the same behaviour with the following setup:

• CephFS Share mounted on all Nodes • Symlink from /var/lib/libvirt/swtpm to CephFS Share • Create a VM with an emulated TPM device (libvirt XML Config)

Migration fails because both Nodes want to Lock the same TPM state file. Maybe it’s possible to prevent QEMU from migrating the state file, don’t lock the file and just rely on the (shared)filesystem.

OS: Debian swtpm: 0.7.1 libvirt: 7.0.0

mt32git avatar Aug 05 '22 14:08 mt32git

Maybe it’s possible to prevent QEMU from migrating the state file, don’t lock the file and just rely on the (shared)filesystem.

QEMU only supports actively migrating the state of the TPM, so it's not possible to not migrate the state.

Are there any (caching) issues we need to be aware of with shared filesystems so that hosts wouldn't always see the latest version of a file or is it guaranteed with shared filesystems that all sides always see the latest files?

stefanberger avatar Aug 08 '22 15:08 stefanberger

Maybe it’s possible to prevent QEMU from migrating the state file, don’t lock the file and just rely on the (shared)filesystem.

QEMU only supports actively migrating the state of the TPM, so it's not possible to not migrate the state.

That's unfortunate... Do you reckon extending QEMU to support "passive" migration of the TPM state would be doable?

Are there any (caching) issues we need to be aware of with shared filesystems so that hosts wouldn't always see the latest version of a file or is it guaranteed with shared filesystems that all sides always see the latest files?

Shared filesystems often support various caching options, some of which may not provide such guarantees. However, flushing file operations at the OS level on the migration source before the migration target opens the TPM state file should guarantee the target sees the latest version.

jean-edouard avatar Aug 08 '22 16:08 jean-edouard

I wonder whether we could mimic what the storage layer is doing. I mean, if we assume that the shared FS guarantees data consistency then we are left with swtpm locking, where both sides of the migration lock the same file(s). Obviously, the destination won't succeed and thus migration fails. What qemu does with disks (which it locks upon its startup) is that on the destination, when -incoming argument is present it does not lock anything, but waits for the source to unlock the disks, signal the destination (when the guest control switches to the destination) at which point the destination locks the disks.

Could we have something similar for swtpm?

zippy2 avatar Aug 09 '22 08:08 zippy2

Could we have something similar for swtpm?

My concern here is the somewhat difficult-to-test case of migration failure and fallback ...

stefanberger avatar Aug 09 '22 12:08 stefanberger

So basically there are two choices:

  1. allow to avoid file locking altogether to support shared storage
  2. introduce (probably) two new command line flags to support not taking the the lock due to incoming migration (--migration incoming) and to release the lock once the state of the TPM 2 has been extracted (--migration release-lock-outgoing)

In either case QEMU should be able to migrate the TPM 2 state from source to destination as it does now. On the destination side it would overwrite the tpm2-00.permall file and the overwritten file should be equivalent to the one the source side previously wrote. The volatile state of the TPM 2 would also be the latest on the destination side and unchanged on the source side. So QEMU shouldn't have to be changed for this.

The fallback case in case of migration failure: The TPM 2 on the destination side will likely need to terminate first to release the lock since it should grab the lock once it started writing the tpm2-00.permall file upon reception of the state. On the source side swtpm should probably grab the lock once QEMU resumes there but we don't hear about this. So the earliest we can grab the lock is when swtpm has again activity on its file descriptors. Depending on how actively the TPM 2 is being used, this could leave a locking gap time-wise or we could fail to be able to grab the lock if swtpm on the destination side hasn't terminated yet. For the latter we may have to loop for the lock for a second or so. Even though I would like to avoid the looping, I don't see how else it could be done.

stefanberger avatar Aug 10 '22 15:08 stefanberger

I now have a branch with a prototype that could be used for testing with shared storage. It introduces the two new flags mentioned above: https://github.com/stefanberger/swtpm/tree/stefanberger/release_lock_for_shared_storage

There's currently a patch on the tip of this series that would allow one to run swtpm with existing libvirt without having to extend libvirt to have swtpm started with the new options on the command line. So the hope would be that using swtpm with these extensions on source and destination side would get it to work. Later on we'll have to extend libvirt to pass the new options to the swptm command line.

diff --git a/src/swtpm/swtpm.c b/src/swtpm/swtpm.c
index 6a9d3a4..1cb9fc9 100644
--- a/src/swtpm/swtpm.c
+++ b/src/swtpm/swtpm.c
@@ -484,6 +484,8 @@ int swtpm_main(int argc, char **argv, const char *prgname, const char *iface)
         goto exit_failure;
     }

+    mlp.incoming_migration = mlp.release_lock_outgoing = true;
+
     if (server) {
         if (server_get_fd(server) >= 0) {
             mlp.fd = server_set_fd(server, -1);

Any help testing this would be appreciated.

stefanberger avatar Aug 10 '22 23:08 stefanberger

That's awesome thank you! I'll give it a try asap!

jean-edouard avatar Aug 11 '22 01:08 jean-edouard

I'm happy to report that it worked!! I tested this using the stefanberger/release_lock_for_shared_storage branch (with the tip being 7b0c8b51cb69dd27c1457b9a7946ecafad3dd7b6, I see you worked on it since). I created a VM and ran a migration + reboot test as well as a reboot + migration test, and those succeeded consistently with shared storage mounted in both the migration source and target. After each migration/reboot, I ensured the TPM still worked. (this is the code for the test: https://github.com/jean-edouard/kubevirt/blob/backendstorageandtpm/tests/vmi_tpm_test.go#L150)

Thanks again for this, please let me know if there's anything else I can help with to push this forward!

jean-edouard avatar Aug 11 '22 18:08 jean-edouard

I sent PR https://github.com/stefanberger/swtpm/pull/732 now.

If you want to use swtpm with these modifications with existing libvirt and simulate the passing of the new migration option you will have to apply the patch above (again also posted in PR) to be able to use libvirt without modifications.

stefanberger avatar Aug 11 '22 19:08 stefanberger

Any more testing with CephFS and other shared filesystems would be really helpful. Thanks.

stefanberger avatar Aug 11 '22 19:08 stefanberger

Awesome, yeah I forgot to mention, I used NFS for the tests I mentioned above. I am not sure how CephFS works, but I will look into it next week! Thank you for the PR!!

jean-edouard avatar Aug 11 '22 19:08 jean-edouard

Maybe @mt32git can give the PR #732 a try with his CephFS setup (and apply that patch if needed on both sides).

stefanberger avatar Aug 11 '22 19:08 stefanberger

@jean-edouard I am trying to test this now also with NFS and it unfortunately doesn't work for me -- at least not when I am trying it with libvirt.

stefanberger avatar Aug 19 '22 17:08 stefanberger