specification
specification copied to clipboard
How should clients handle interrupted updates?
I am currently doing some exploration into how clients should handle interrupted, partially successful updates. For example, say we have a client that has a local cached copy of valid and unexpired metadata. We start an update process which includes a new timestamp
, snapshot
, and targets
metadata. Unfortunately, we download the new timestamp and snapshot and persist them to disk, but the device loses power. Then when power is restored the network is down. We'd still like to make queries against the TUF targets file, but according to the workflow, we should get an error. We can only recover from this by restoring the network.
This is particularly relevant to Fuchsia, because of how we have created our packaging system. We want to treat the TUF targets as the list of executable packages, since it allows us to maintain a cryptographic chain of trust all the way down to the bootloader for what things can be executed. All our packages are stored in a content-addressed filesystem, and we use the custom
field in a TUF target to provide the mapping from a human readable name to a merkle addressed package blob. When we try to open a package, we first look in TUF to find the merkle, then we check if we've already downloaded that blob. If so, we open up that package and serve it to the caller. See this slightly stale doc for more details. Due to this interrupted update problem, there's a chance a Fuchsia device could be made unusable until we are able to finish updating our metadata.
If not, we have had a few ideas on how to approach this:
- If an update fails, we could still query the local latest
targets
metadata, assuming it was signed with a key that's still trusted by theroot
metadata. - During the update, we delay writing all the metadata to disk until all the files have been downloaded and verified. Then the files are written in one atomic transaction.
- For consistent snapshot metadata (which we only plan on supporting), fetch the
timestamp
metadata, but don't persist it to disk yet. Fetch and write the versioned-prefixedsnapshot
andtargets
metadata, and any other delegated metadata, to disk. Atomically write thetimestamp
metadata to disk, then clean up any old snapshot/targets/etc metadata files.
I'm not sure if these ideas would weaken the TUF security model though. Is there a better way for dealing with this, and could we incorporate this into the spec (or a POUF?), since I imagine other folks might need a solution for this.
I think option 2 above makes more sense:
During the update, we delay writing all the metadata to disk until all the files have been downloaded and verified. Then the files are written in one atomic transaction.
for implementation simplicity, this could be as simple as having current
and next
directories (where next
is probably a temporary directory). The updater workflow would proceed in the next
directory until all files have been verified and downloaded and only then are the contents of next
moved to current
. This is simpler than the current
/previous
model because we don't have to worry about loading partial metadata, only preventing it from being persisted.
I don't believe this would weaken the TUF security model, but perhaps others will speak up.
Aside, I'd love to see a POUF for Fuchsia's TUF implementation.
The detailed client workflow states:
Note: If a step in the following workflow does not succeed (e.g., the update is aborted because a new metadata file was not signed), the client should still be able to update again in the future. Errors raised during the update process should not leave clients in an unrecoverable state.
The reference implementation interprets handles this by storing current
and previous
versions of the metadata.