Add support for additional retrieval protocols (HTTP)
Background
Currently ipfs-check is limited to supporting Bitswap checks. It would be useful to add support for other transfer protocols that are already prolific, e.g. transport-ipfs-gateway-http, for which there are many providers in the IPNI.
Part of this should include allowing users in the UI to choose which protocols to test as part of the check.
Checking HTTP providers
As discussed in #70,
If we want to probe HTTP, we can make trustless gateway HEAD request with Accept: application/vnd.ipld.raw and ?format=raw https://specs.ipfs.tech/http-gateways/path-gateway/#only-if-cached-head-behavior
Challenges
existing interfaces
We currently rely on the FindProvidersAsync(context.Context, cid.Cid, int) <-chan peer.AddrInfo method to handle getting providers from both the IPNI and the DHT concurrently. The challenge is that we don't have protocol information in that channel. We will likely need a helper method to convert the iterator from the Routing V1 client into a channel that returns results (rather than
peer.AddrInfo) so that we can rely on the same concurrency primitives while having the protocol information.
Problems with transport-ipfs-gateway-http providers
Let's take the example you gave of the CID: bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi.
It has two providers with the "transport-ipfs-gateway-http" protocol:
[
{
"Addrs": [
"/ip4/212.6.53.27/tcp/80/http"
],
"ID": "12D3KooWHEzPJNmo4shWendFFrxDNttYf8DW4eLC7M2JzuXHC1hE",
"Protocols": [
"transport-ipfs-gateway-http"
],
"Schema": "peer",
"transport-ipfs-gateway-http": "oBIA"
},
{
"Addrs": [
"/ip4/212.6.53.28/tcp/80/http"
],
"ID": "12D3KooWJ8YAF6DiRxrzcxoeUVjSANYxyxU55ruFgNvQB4EHibpG",
"Protocols": [
"transport-ipfs-gateway-http"
],
"Schema": "peer",
"transport-ipfs-gateway-http": "oBIA"
}
]
A couple of problems with this one specifically:
- It's only HTTP. No TLS, so not usable in the browser.
- Even over HTTP, it doesn't fully implement the trustless gateway protocol, e.g. HEAD requests are not allowed:
$ http HEAD http://212.6.53.28/ipfs/bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi "Accept: application/vnd.ipld.raw"
HTTP/1.1 405 Method Not Allowed
Allow: GET
Content-Length: 18
Content-Type: text/plain; charset=utf-8
Date: Wed, 02 Oct 2024 11:14:49 GMT
Vary: Origin
Vary: Accept-Encoding
A deeper investigation revealed that this is a Boost server, which uses frisbii as its gateway server which doesn't support HEAD only-if-cached requests. I have opened an issue for that.
IMO it's both unhelpful and not responsible to allow configurability without documentation to support it. For example, allowing someone to add http-trustless-gateway is not responsible without explaining that:
- it's not supported in the most common places (e.g. kubo, rainbow, and ipfs.io)
- we aren't testing whether it's retrievable yet (easy and we can do it for HTTP trustless gateway, but for other ones that's something we'd have to consider on a case-by-case basis and isn't really doable in the generic sense)
This ^ is related to building the start of caniuse.com equivalent for IPFS tooling, just to help people understand what interoperability means.... but it can quickly turn into a bigger ordeal than I think we have time for right now.
IMO it's also fine for us to limit what the tool (or at least the deployment on check.ipfs.network) does to support what people think of when they think of IPFS (i.e. "mainnet") to reduce confusion.
In practice my recommendation for http trustless gateway is to do one of:
- Don't add support until it's present in at least kubo, rainbow and ipfs.io
- Add support now with warning signs about how this protocol isn't enough to be supported in common implementations and gateway deployments
My inclination is towards 1, but if people think 2 would be helpful that's fine by me.
Related research/experiment started in https://github.com/ipfs/boxo/pull/747
I think we could start implementing HTTP probing based on work from https://github.com/ipfs/boxo/pull/747.
Some loose implementation notes about important parts:
- For now, make it opt-in (behind a checkbox in "Backend Config", passing it to backend same way we pass "Check Timeout")
- Realistic focus here is "bitswap+HTTP" mid-term future where HTTP means
?format=rawfrom https://specs.ipfs.tech/http-gateways/trustless-gateway/ (no CARs), and we prefer HTTP retrieval when peer exposes both. - We should include peers with
transport-ipfs-gateway-http, but also keep in mind this is IPNI-specific metadata, and we want to also support providers that announce on DHT that they speak HTTP.- When deciding which peers should be probed for HTTP, we should follow logic from Boxo, where we identify HTTP multiaddrs (
/tls+httporhttps) and send a probe to them. This makes checker future-proof, will work when we have HTTP providers returned by systems other than IPNI. - Related spec work: https://github.com/ipfs/specs/pull/501
- When deciding which peers should be probed for HTTP, we should follow logic from Boxo, where we identify HTTP multiaddrs (
- Probe should be HTTP
HEAD /ipfs/bafkqaaa?format=rawand if that fails, fallback toGET- Make requests with both
?format=rawandAccept: application/vnd.ipld.rawfor interop and to mirror boxo behavior - Require HTTP/2 (TLS with CA-signed cert + streams) – this is important. We do it in boxo too to enforce correct TLS setup so those HTTP peers can be used in browser by verified-fetch and https://inbrowser.link
- Expect HTTP 200
- Make requests with both
- https://check.ipfs.network/ should error if peer announces HTTP but does not reply correctly to both HEAD and GET
- lack of HEAD support is a soft fail / warning. HEAD acts like Have check from Bitswap 1.2, and we want to penalize providers who don't support it (display red error/warning)