firecracker icon indicating copy to clipboard operation
firecracker copied to clipboard

[Snaps] #snapsafe support via SysGenID in Systemd

Open acatangiu opened this issue 3 years ago • 1 comments

Feature Tracker

This is a feature tracking issue for the work to enable Linux guests across the virtualization industry to safely and efficiently use snapshots [1] by adding a SysGenID (or similar) mechanism to Systemd [2].

Describe the desired solution

Required in virtualized or containerized environments by applications that work with local copies or caches of world-unique data such as random values, uuids, monotonically increasing counters, cryptographic nonces, etc. Such applications can be negatively affected by VM or container snapshotting when the VM or container is either cloned or returned to an earlier point in time.

Solving the uniqueness problem strongly enough for cryptographic purposes requires a mechanism which can deterministically reseed userspace PRNGs with new entropy at restore time. This mechanism must also support the high-throughput and low-latency use-cases that led programmers to pick a userspace PRNG in the first place; be usable by both application code and libraries; allow transparent retrofitting behind existing popular PRNG interfaces without changing application code; it must be efficient, especially on snapshot restore; and be simple enough for wide adoption.

We need to introduce a mechanism that standardizes an API for applications and libraries to be made aware of uniqueness breaking events such as VM or container snapshotting, and allow them to react and adapt to such events.

The System Generation ID is meant to help in these scenarios by providing a monotonically increasing u32 counter that changes each time the VM or container is restored from a snapshot.

The sysgenid service exposes a monotonic incremental System Generation u32 counter via the DBus com.RFC.sysgenid accessible at /com/RFC/sysgenid. It provides asynchronous SysGen counter update notifications, as well as counter retrieval and confirmation mechanisms. The counter starts from zero when the service is started and monotonically increments every time the system generation changes.

Userspace applications or libraries can (a)synchronously consume the system generation counter through the provided DBus interface, to make any necessary internal adjustments following a system generation update.

The provided DBus interface operations can be used to build a system level safe workflow that guest software can follow to protect itself from negative system snapshot effects.

System generation changes are driven by userspace software through a dedicated DBus method.

Describe possible alternatives

System Generation ID kernel driver [3].

Additional context

See [1], [2] and [3].

Checks

  • [x] Have you searched the Firecracker Issues database for similar requests?
  • [x] Have you read all the existing relevant Firecracker documentation?
  • [x] Have you read and understood Firecracker's core tenets?

[1] https://github.com/firecracker-microvm/firecracker/blob/master/docs/snapshotting/snapshot-support.md#snapshot-security-and-uniqueness [2] https://github.com/systemd/systemd/issues/19269 [3] https://lkml.org/lkml/2021/3/23/927

acatangiu avatar May 07 '21 14:05 acatangiu

Removing from the roadmap currently because we are working on #2476 and it is not clear if systemd integration of snapsafety implementation will impact Firecracker

xmarcalx avatar Jul 05 '22 11:07 xmarcalx

We have been looking into other ways for implementing #snapsafety in Firecracker. Our focus at the moment is an extension in the VirtIO rng device [1] that allows to build snapshot safety mechanisms, focused on re-seeding PRNGs in the guest.

For use-cases other than PRGNs, we have proposed a simple extension in the VMGENID mechanism [2] that can be used from user-space systems like systemd.

I will close this since we do not plan to work with SystemD for addressing this and we will rely on mechanisms other than sysgenid.

[1] https://www.mail-archive.com/[email protected]/msg09016.html [2] https://lkml.org/lkml/2023/5/31/414

bchalios avatar Nov 06 '23 10:11 bchalios