python-tuf icon indicating copy to clipboard operation
python-tuf copied to clipboard

Prototype support for content addressable systems such as IPFS

Open adityasaky opened this issue 2 years ago • 8 comments

NOTE: This ticket is for a potential GSoC 2023 task.

TUF’s specification was written with artifacts stored in traditional file systems in mind. As such, it specifies explicitly how artifacts must be hashed in order to guarantee their integrity. Since TUF was first created, however, content addressable systems for storage and data transmission have become more prominent. Some examples of these systems are Git, the InterPlanetary File System (IPFS), and OSTree. All of these can present a file-like interface for artifacts they store, and have built-in mechanisms for ensuring the integrity of artifacts. When TUF is used with these systems, it is redundant for it to also ensure artifact integrity. Instead, TUF can delegate these guarantees to the underlying content addressable system, and focus on higher level security properties the specification provides. As part of this GSoC project, the participant will add support to an existing TUF implementation to delegate artifact integrity verification to the underlying content addressable system, specifically IPFS.

Also see: https://github.com/theupdateframework/taps/pull/156

Primary Goal

Allow delegating just the black-box targets to the content-addressing system. This is what our current draft TAP, https://github.com/theupdateframework/taps/pull/156, specifies. This is less invasive since, as stated above, targets are already black-box data to the rest of TUF. The draft TAP is pretty agnostic to which mechanism is used --- the examples of Git, IPFS, and OSTree above are taken for example. And given the black-box nature of targets, we think this the correct choice. The GSOC mentee is welcome to aim for support with just one or multiple of those with their prototype implementation.

Stretch Goal

TBD/WIP

GSoC Mentors

If accepted, this task will be mentored by myself (@adityasaky), John Ericson (@Ericson2314), and Marina Moore (@mnm678). This ticket was authored by all of us.

adityasaky avatar Mar 07 '23 17:03 adityasaky

Hi, I am interested in working on this project and applying for GSoC 2023. How can I contact you?

PandiyanDev avatar Mar 13 '23 04:03 PandiyanDev

This task has been assigned to @shubham4443. @mnm678 would it be possible to assign it to him formally on the issue?

adityasaky avatar Jun 09 '23 16:06 adityasaky

@shubham4443 if you add a comment here I can assign you (Github limits assignees to folks who have commented or have permission in the repo)

mnm678 avatar Jun 09 '23 16:06 mnm678

@mnm678 Adding a comment.

shubham4443 avatar Jun 09 '23 17:06 shubham4443

Just thinking out loud here: The seeming difficulty in properly integrating IPFS (and the fact that the uses cases in the TAP seem so different from each other from an implementation perspective) leads me to wonder whether it makes sense for python-tuf to handle the download at all. The whole point of TAP-19 seems to be that the TUF library no longer manages integrity, only the correct delegation... so why would we go through the trouble of abstracting the concept of "download a thing" for all of {http,ipfs,git,ostree}?

What if the application that uses python-tuf just worked like this instead:

updater = tuf.ngclient.Updater(...)

if not updater.get_targetinfo(targetpath)
    raise RuntimeError("oops, target not found")

# tuf has now confirmed the targetpath is signed by the correctly delegated role: we can download
response = requests.get(gateway_url + parse_cid(targetpath), timeout=5)

I can see a couple of possible issues:

  • This wouldn't use python-tuf artifact cache, that could be seen as a negative... but considering the design includes a local ipfs gateway, that seems like the correct place to cache things in this design?
  • the application would have to specifically support IPFS (instead of just using TUF to download a file without caring about the mechanism). I'm no sure if this is a major negative as
    • the different TAP19 systems will likely require that anyway (a git repo/commit is not a file -- there's no point in pretending it is)
    • the IPFS implementation in the PR requires the local gateway anyway: so in practice the application needs to ensure that a gateway is running, meaning it does know about the mechanism

jku avatar Jun 29 '23 13:06 jku

What if the application that uses python-tuf just worked like this instead:

updater = tuf.ngclient.Updater(...)

if not updater.get_targetinfo(targetpath)
    raise RuntimeError("oops, target not found")

# tuf has now confirmed the targetpath is signed by the correctly delegated role: we can download
response = requests.get(gateway_url + parse_cid(targetpath), timeout=5)

or as another option: A small python-tuf-ipfs library implements a downloader client library with a nice IPFS specific API that just uses python-tuf like above

jku avatar Jun 29 '23 13:06 jku

The stretch goal up in the original post is content-addressing the metadata. I finally found some time this morning, and clarified and wrote down my thoughts in https://github.com/Ericson2314/tuf-content-addressing-notes. I would be more than happy to transfer that repo to this org / otherwise make it a collaboration!

@adityasaky in https://github.com/theupdateframework/python-tuf/pull/2415#issuecomment-1619257835 you wrote:

@Ericson2314 I'm not sure if this is practical, though it depends on "root" in your message. Do you mean we remove the snapshot role and have the timestamp role identify the IPFS root node that contains the current set of all TUF metadata?

Yes I was very unclear/thoughts have baked. The tl;dr of the notes above is:

  • snapshot role vs timestamp roll separation still seems good
  • It's the consistent snapshots protocol (e.g. version numbers in file names) that is obviated
  • snapshot objects themselves are more important, because they replace the notion of a repository.
  • the root object (in the sense of the the object defining the root role) != the root object of the Merkle DAG, confusing! :)

Ericson2314 avatar Jul 04 '23 17:07 Ericson2314

Prototype can now be found here - https://github.com/theupdateframework/tap19-ipfs-poc

shubham4443 avatar Sep 06 '23 14:09 shubham4443