Add Software ID in opam file
Add Software Heritage ID field swhid in url section, and in opam show, and fallback support
A solution might be to ping
softwareheritage.orgfirst ?
a real ping isn’t a good solution as some (weird but real) networks disallow ICMP requests
In fact, we can simply retrieve the default page and drop the result.
For my own education, is there any documentation about:
- what Software Heritage IDs are
- how they are added to an opam package
- why they are different from a conventional content-addressed store (e.g. something like IPFS/hash fetching)
- can the fallback be switched off, and/or does this affect purely offline use of opam with local files?
Seems like a nice feature, but I'm missing a little background.
@avsm - apologies, this got discussed at a dev meeting in September, but only put into notes from the meeting, and not commented back (which is my fault...)!
what Software Heritage IDs are The SWHID (see also their documentation) is just a(nother) content hash.
how they are added to an opam package
This PR (and opam 2.2) is focussed on being able to consume the content hashes - for adding them, the intention is to have existing package submission tools (opam-publish, dune-release, etc.) generate the SWHID along with other hashes from the release tarball. opam-repository may then choose to lint based on these. Part of the Software Heritage project work for programming language ecosystems is that they monitor opam-repository and ensure that the archives referenced are actually in their archive (it is also possible - and desirable - that the package submission would itself submit the archive to SWH, but the point is that the SWHID being in an opam file is neither gated nor requires its actually being submitted to the service).
Adding the facility to use the service in opam 2.2 obviously doesn't mandate deploying it for all packages on opam-repository (or in a future version) - just as adding the additional sha hash functions earlier in opam 1.x didn't.
why they are different from a conventional content-addressed store (e.g. something like IPFS/hash fetching)
They're related, indeed largely equivalent. The Software Heritage project, though, is focussed on persistence, where IPFS gives availability. I don't have an answer as to why they end up using different content hashes - although there is a project in progress to provide the Software Heritage archive over IPFS.
can the fallback be switched off, and/or does this affect purely offline use of opam with local files?
Yes - in fact, unless I've misread, it is not automatically enabled (it's an interactive prompt and would be off in batch/non-interactive mode, @rjbou?).
opam being able to retrieve archives from Software Heritage is a partial solution to an upstream server being down (opam.ocaml.org, GitHub, etc.) but that's not the primary point. Primarily, this provides an external solution and ongoing solution for things like https://github.com/ocaml/opam-source-archives and should mean that we never end up scrabbling to find archives in old server images etc. when gforge services get shut down or personal web servers get taken offline, etc.
can the fallback be switched off, and/or does this affect purely offline use of opam with local files?
The fallback is enabled by default, it can be disabled by opam option swh_fallback=false. It is used in last resort when archive is unreachable via 1. url defined in opam package 2. in opam repository cache & its mirrors. From there, a prompt asks if you want to try download via SWH (see in test). Before calling the fallback/prompting, we ensure that network is up & SWH api is up.
Many thanks for the updates. One quick design question though, after reading the SWHID docs linked: it looks like they define a URL schema, so why don't we just interpret this schema in url fields (just as we do for https+git for example)?
We wanted to use it but we couldn't, mainly for compatibility with old clients. See this comment.
Happy to see this moving forward! As a side note, a working group is assembling to start the normalization process of SWHIDs, and it would be great if some of the contributors to this work in opam could join. See more information here https://www.swhid.org/