opam icon indicating copy to clipboard operation
opam copied to clipboard

Add Software ID in opam file

Open rjbou opened this issue 4 years ago • 3 comments

Add Software Heritage ID field swhid in url section, and in opam show, and fallback support

rjbou avatar Oct 06 '21 13:10 rjbou

A solution might be to ping softwareheritage.org first ?

a real ping isn’t a good solution as some (weird but real) networks disallow ICMP requests

kit-ty-kate avatar Aug 05 '22 15:08 kit-ty-kate

In fact, we can simply retrieve the default page and drop the result.

rjbou avatar Aug 05 '22 15:08 rjbou

For my own education, is there any documentation about:

  • what Software Heritage IDs are
  • how they are added to an opam package
  • why they are different from a conventional content-addressed store (e.g. something like IPFS/hash fetching)
  • can the fallback be switched off, and/or does this affect purely offline use of opam with local files?

Seems like a nice feature, but I'm missing a little background.

avsm avatar Sep 08 '22 13:09 avsm

@avsm - apologies, this got discussed at a dev meeting in September, but only put into notes from the meeting, and not commented back (which is my fault...)!

what Software Heritage IDs are The SWHID (see also their documentation) is just a(nother) content hash.

how they are added to an opam package

This PR (and opam 2.2) is focussed on being able to consume the content hashes - for adding them, the intention is to have existing package submission tools (opam-publish, dune-release, etc.) generate the SWHID along with other hashes from the release tarball. opam-repository may then choose to lint based on these. Part of the Software Heritage project work for programming language ecosystems is that they monitor opam-repository and ensure that the archives referenced are actually in their archive (it is also possible - and desirable - that the package submission would itself submit the archive to SWH, but the point is that the SWHID being in an opam file is neither gated nor requires its actually being submitted to the service).

Adding the facility to use the service in opam 2.2 obviously doesn't mandate deploying it for all packages on opam-repository (or in a future version) - just as adding the additional sha hash functions earlier in opam 1.x didn't.

why they are different from a conventional content-addressed store (e.g. something like IPFS/hash fetching)

They're related, indeed largely equivalent. The Software Heritage project, though, is focussed on persistence, where IPFS gives availability. I don't have an answer as to why they end up using different content hashes - although there is a project in progress to provide the Software Heritage archive over IPFS.

can the fallback be switched off, and/or does this affect purely offline use of opam with local files?

Yes - in fact, unless I've misread, it is not automatically enabled (it's an interactive prompt and would be off in batch/non-interactive mode, @rjbou?).

opam being able to retrieve archives from Software Heritage is a partial solution to an upstream server being down (opam.ocaml.org, GitHub, etc.) but that's not the primary point. Primarily, this provides an external solution and ongoing solution for things like https://github.com/ocaml/opam-source-archives and should mean that we never end up scrabbling to find archives in old server images etc. when gforge services get shut down or personal web servers get taken offline, etc.

dra27 avatar Nov 02 '22 12:11 dra27

can the fallback be switched off, and/or does this affect purely offline use of opam with local files?

The fallback is enabled by default, it can be disabled by opam option swh_fallback=false. It is used in last resort when archive is unreachable via 1. url defined in opam package 2. in opam repository cache & its mirrors. From there, a prompt asks if you want to try download via SWH (see in test). Before calling the fallback/prompting, we ensure that network is up & SWH api is up.

rjbou avatar Nov 04 '22 15:11 rjbou

Many thanks for the updates. One quick design question though, after reading the SWHID docs linked: it looks like they define a URL schema, so why don't we just interpret this schema in url fields (just as we do for https+git for example)?

avsm avatar Nov 07 '22 22:11 avsm

We wanted to use it but we couldn't, mainly for compatibility with old clients. See this comment.

redianthus avatar Nov 12 '22 12:11 redianthus

Happy to see this moving forward! As a side note, a working group is assembling to start the normalization process of SWHIDs, and it would be great if some of the contributors to this work in opam could join. See more information here https://www.swhid.org/

rdicosmo avatar Jan 25 '23 09:01 rdicosmo