specs icon indicating copy to clipboard operation
specs copied to clipboard

gateway: batching raw block requests (AKA userland traversal of DAGs with unknown codecs)

Open lidel opened this issue 11 months ago • 0 comments

Performance gap in trustless retrieval

Something we've identified during user research is the ability to fetch arbitrary raw blocks in a single request.

This comes up in two use cases:

  • content-addressed data built with custom codecs (such as blockchains, bitorrent,)
  • error handling in retrieval clients, sharding/resuming partial download (for example, only fetching specific layer of a DAG)

In both cases the DAG is not traversable on the backend, but the client still is able to retrieve it block by block: reading the root, then learning about child branches, and requesting each of them as application/vnd.ipld.raw.

The downside is the number of unnecessary roundtrips when multiple CIDs could be requested at the same time.

We want to remove the gateway as an innovation choke point and improve performance for in use cases where content-addressable data can't be traversed by the gateway, but can still be retrieved block-by-block.

The need

Specification should include a canonical way for batching multiple application/vnd.ipld.raw in a single request.: asking trustless gateway for N CIDs, and getting related blocks back without spending resources on multiple requests.

It could be a new request-response type, or a clarification around multiplexing present in HTTP/2 and HTTP/3.

TODO

  • [ ] benchmark and evaluate if new request-response type is actually needed, or do we send many application/vnd.ipld.raw
    • HTTP/2 suffers from a head-of-line blocking issue on TCP layer, but maybe is enough?
    • HTTP/3 brings true multiplexing with HTTP/3 and QUIC
    • Given the HTTP Caching should cache individual block responses along any CDNs and other HTTP middleware, do we need a solution for HTTP/1.1?
  • [ ] IF we don't need a new response type, document best practices around maximizing perf. on HTTP/2 and /3.
  • [ ] IF we need a new response type
    • [ ] figure out how to ask for mutliple unrelated CIDs in a single request
      • initial idea: https://www.w3.org/DesignIssues/MatrixURIs.html (use ; as CID separator: /ipfs/cid1;cid2;cid3
    • [ ] figure out what should be the response format
      • initial idea: return application/vnd.ipld.car with roots being the requested CIDs
    • [ ] propose IPIP for https://specs.ipfs.tech/http-gateways/trustless-gateway/
      • [ ] include limit of CIDs requested in a single batch
      • [ ] include notes on cid ordering / cache control (batch responses should be cachable the same way as application/vnd.ipld.raw is)

lidel avatar Jul 18 '23 18:07 lidel