rainbow icon indicating copy to clipboard operation
rainbow copied to clipboard

Support direct HTTP retrieval from /https providers

Open lidel opened this issue 1 year ago • 4 comments

This is GO version of https://github.com/ipfs-shipyard/service-worker-gateway/issues/72.

We want rainbow to benefit from /https providers (example) and use them in addition to bitswap

Ideally, we would be prioritizing HTTP retrieval over bitswap, where possible, as it lowers the cost of content providers, and incentivizes them to configure, expose, and announce HTTPS endpoints.

MVP scope

Focus should be on block (application/vnd.ipld.raw, ?format=raw) requests, as these will always work, across all implementations, and provide the best cachability for HTTP infrastructure we have.

CAR with IPIP-402 may be more involved, and may lead to duplicated block retrievals due to the way loading a page with a dozen of subresources works (all share the same parent, all fetched in parallel, may lead to racy case where parent blocks are fetched multiple times, slowing down page loads)

lidel avatar Apr 22 '24 14:04 lidel

Before continuing, I want to lay down some notes to make sure we're all on the same page about what needs to be done and about the current challenges with accepting the /https providers.

Most providers with HTTPS multiaddresses are unusable

Most, if not all, providers advertising /https multiaddresses are, standard-speaking, unusable. They do not follow the proper peer schema. We can certainly hammer the code to accept them, but I would rather have the original provider of the records implement the correct schema instead. So, instead of:

{
  "Addrs": ["/dns4/dag.w3s.link/tcp/443/https"],
  "ID": "QmUA9D3H7HeCYsirB3KmPSvZh3dNXMZas6Lwgr4fv1HTTp",
  "Metadata": "oBIA",
  "Protocol": "transport-ipfs-gateway-http",
  "Schema": "unknown"
},

We should be getting this:

{
  "Schema": "peer",
  "ID": "QmUA9D3H7HeCYsirB3KmPSvZh3dNXMZas6Lwgr4fv1HTTp",
  "Addrs": ["/dns4/dag.w3s.link/tcp/443/https"],
  "Protocols": ["transport-ipfs-gateway-http"]
}

As I said, the code can be hammered to accept this (albeit a bit harder in Go). But I would rather not go that avenue. We already have plans of completely removing support for "Schema": "bitswap" (e.g.: from Pinata) from Boxo. Supporting one more non-standardized schema will just make things more complicated when it doesn't need to be.

Fetching the block via HTTPS

The current flow to fetch a block, from the Blockservice perspective, is as follows:

  1. Blockservice gets asked for a block
  2. Blockservice checks with Blockstore, if it has it, return it. Otherwise,
  3. Blockservice asks the Exchange, which currently is just Bitswap
  4. Bitswap looks out for providers using a routing.ContentRouter. This routing.ContentRouter only has Bitswap-related peers. All other peers are ignored, even if they come from a /routing/v1 endpoint.
  5. Bitswap tries fetching it, returns, etc, etc.

I see a few ways of potentially solving this.

(a) Parallel Exchanges

Create a parallel exchange that calls both Bitswap and a new exchange that can take advantage of the Delegated Routing endpoint results that have non-Bitswap peers.

Challenges I see:

  1. Duplicate HTTP requests to delegated routing endpoints, done by both exchanges.

(b) Smarter Exchange

An exchange where you can register sub-exchanges (or fetchers) for certain protocol types. This exchange would call FindProviders itself, and depending on the results, would parellelize calls to different fetchers (Bitswap, Gateways, etc).

Challenges I see:

  1. We need to already be able to tell the Bitswap client that we know that peer X has the block Y to avoid it doing the FindPeers request again. Maybe it's already possible, but I'm not familiar enough with the code. Needs investigation.
  2. Reconcile Delegated Routing lookups with DHT lookups. Boxo only provides code for the opposite case: delegated routing to Libp2p routers, ignoring every non-bitswap code. . This is already done in someguy, which parallelizes DHT and Delegated Routing endpoints into a Delegated Routing-like interface. We'll likely want to re-use the code.

(b) seems technically more complicated (at least without looking at what is currently possible), but likely better to save duplicated HTTP requests and resources. We can also probably reuse the new RemoteBlockstore from boxo/gateway to fetch remote blocks from the /https peers.

hacdias avatar Apr 23 '24 12:04 hacdias

Triage:

  • Most providers with HTTPS multiaddresses are unusable
    • try to clean up cid.contact/routing/v1 responses, switch both http and bitswap providers to modern peer schema so we can remove hacks from boxo/kubo/rainbow
  • look into Smarter Exchange
    • assumption: we always have some peerid and some Addrs, so we cna reuse interfaces from libp2p
    • try to libp2p identify to learn p2p protocols, and use bitswap if present
    • if /http/tls or /https or /http is present, attempt HTTP retrieval instead of bitswap

lidel avatar Apr 23 '24 13:04 lidel

Update:

  • cid.contact/routing/v1 now returns records with the peer schema (https://github.com/ipni/indexstar/pull/185)

hacdias avatar Apr 24 '24 08:04 hacdias

Something we could try, without changing too much, without touching higher level abstractions like exchanges, is doing opportunistic HTTP fetch in boxo/bitswap itself.

Wrote initial thoughts in https://github.com/ipfs/boxo/issues/608 – pinged some folks, looking for feasibility feedback.

lidel avatar May 06 '24 21:05 lidel

This issue can be closed now that https://github.com/ipfs/rainbow/pull/242 is merged.

2color avatar Mar 07 '25 10:03 2color