helia icon indicating copy to clipboard operation
helia copied to clipboard

feat: add explicit support for subdomain gateways

Open 2color opened this issue 1 year ago • 8 comments

Title

The main goal of this PR is to explicitly support subdomain gateways to avoid getting redirects like we currently do from https://4everland.io which no longer supports path gateways.

Since the TrustlessGatewayBlockBrokerInit type has changed this is a breaking change

Change checklist

  • [x] I have performed a self-review of my own code
  • [x] I have made corresponding changes to the documentation if necessary (this includes comments as well)
  • [x] I have added tests that prove my fix is effective or that my feature works

2color avatar Feb 20 '24 12:02 2color

@lidel I'm not thrilled about making HTTP calls to the gateways in the constructor just to test for redirection because it introduces a side-effect. But I can see the value in it. Especially for user passed gateways.

The other problem is that you have to pay a runtime cost of making an HTTP call every time it's instantiated, rather than configuring it correctly once.

When you say "avoid hardcoding subdomain status", do you mean maintain the logic differentiating between subdomain and path gateways but autodetect in the code by making an HTTP request instead of passing it through?

2color avatar Feb 21 '24 14:02 2color

Unless I'm missing something, looking at the responses from http delegated routers for get provs, this check might be something we need to do?

The peer schema for providers includes protocols like "transport-ipfs-gateway-http" but it doesn't tell you if it's a subdomain or a path gateway.

achingbrain avatar Feb 21 '24 15:02 achingbrain

Unless I'm missing something, looking at the responses from http delegated routers for get provs, this check might be something we need to do?

I think it's even more complex than that either way and I'm not sure if it's in the scope of this PR. For example, https://delegated-ipfs.dev/routing/v1/providers/bafybeicklkqcnlvtiscr2hzkubjwnwjinvskffn4xorqeduft3wq7vm5u4 returns two of these:

[     {
      "Addrs": [
        "/ip4/212.6.53.91/tcp/80/http"
      ],
      "ID": "12D3KooWHEzPJNmo4shWendFFrxDNttYf8DW4eLC7M2JzuXHC1hE",
      "Metadata": "oBIA",
      "Protocol": "transport-ipfs-gateway-http",
      "Schema": "unknown"
    }, 
    {
      "Addrs": [
        "/dns4/dag.w3s.link/tcp/443/https"
      ],
      "ID": "QmUA9D3H7HeCYsirB3KmPSvZh3dNXMZas6Lwgr4fv1HTTp",
      "Metadata": "oBIA",
      "Protocol": "transport-ipfs-gateway-http",
      "Schema": "unknown"
    }]

The first one isn't helpful because there's no TLS cert, but the second one isnt' really helpful either because it only supports cars:

curl -i -H "Accept: application/vnd.ipld.raw"  "https://dag.w3s.link/ipfs/bafybeicklkqcnlvtiscr2hzkubjwnwjinvskffn4xorqeduft3wq7vm5u4
HTTP/2 406
date: Wed, 21 Feb 2024 17:34:46 GMT
content-type: text/plain;charset=UTF-8
content-length: 14
server: cloudflare
cf-ray: 8590bdf19a9344fe-TXL

not acceptable⏎
curl -H "Accept: application/vnd.ipld.car" -i  "https://dag.w3s.link/ipfs/bafybeicklkqcnlvtiscr2hzkubjwnwjinvskffn4xorqeduft3wq7vm5u4"
HTTP/2 200
date: Wed, 21 Feb 2024 17:35:40 GMT
content-type: application/vnd.ipld.car; version=1; order=undefined; dups=y
cf-ray: 8590bf3fbef84528-TXL
accept-ranges: none
access-control-allow-origin: *
cache-control: public, max-age=29030400, immutable
content-disposition: attachment; filename="bafybeicklkqcnlvtiscr2hzkubjwnwjinvskffn4xorqeduft3wq7vm5u4.car"; filename*=UTF-8''bafybeicklkqcnlvtiscr2hzkubjwnwjinvskffn4xorqeduft3wq7vm5u4.car
etag: W/"bafybeicklkqcnlvtiscr2hzkubjwnwjinvskffn4xorqeduft3wq7vm5u4.car"
vary: Accept, Accept-Encoding
access-control-allow-methods: GET
access-control-expose-headers: Content-Length
x-content-type-options: nosniff
x-freeway-version: 2.15.0
server: cloudflare

2color avatar Feb 21 '24 17:02 2color

Should be working now for raw blocks, sorry about that:

curl -H "Accept: application/vnd.ipld.raw" https://dag.w3s.link/ipfs/bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi --output block.bin
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  116k  100  116k    0     0   153k      0 --:--:-- --:--:-- --:--:--  154k

Free free to ping me for dag.w3s.link problems. We set that gateway up very quickly for Saturn and they've never requested raw blocks.

alanshaw avatar Feb 29 '24 16:02 alanshaw

Following the discussion in Helia-WG, it was brought up whether we want to detect whether a gateway is a subdomain or path at runtime (either leveraging the preflight request and this spec https://github.com/ipfs/specs/pull/425 or using heuristics. I don't believe we made a decision, right @lidel ?

2color avatar Feb 29 '24 17:02 2color

We did not make a decision as such, but we did talk a bit about the constraints.

  1. Turns out the absence of transport-ipfs-gateway-http is not enough to use as a reason to assume that a given peer is not running a path/subdomain gateway
  2. Some CIDs cannot be used with a subdomain gateway (e.g. very long or in case-sensitive encoding)
  3. Gateways may redirect you to a subdomain if it's supported and the CID can be used this way

Given the above, I think there's still value in allowing the user to specify if a gateway is 100% absolutely for sure a subdomain gateway, but if not we should start by having them as a path gateway, examine the CID we are requesting, if it can be used in a subdomain and we receive a redirect to a subdomain URL we can flip that gateway into subdomain mode for future requests.

If the CID cannot be used in a subdomain we should treat it as a path gateway for this request*.

The preflight request should help here but AFAIK it's not available to browser app code. Can we query the cache for it and detect it that way?


* = What if we try convert it to a subdomain-compatible CID? E.g. v0 base58btc to v1 base36? Test for length, etc.

achingbrain avatar Mar 04 '24 16:03 achingbrain

Given the above, I think there's still value in allowing the user to specify if a gateway is 100% absolutely for sure a subdomain gateway, but if not we should start by having them as a path gateway, examine the CID we are requesting, if it can be used in a subdomain and we receive a redirect to a subdomain URL we can flip that gateway into subdomain mode for future requests

I suppose we'd do this check when a given GatewayBroker requests a block for the first time to avoid an unnecessary request and side effects when instantiating. Do you agree with the broad strokes of this approach?


Also dropping this link where we recently implemented the conversion to subdomain resolution.

2color avatar Mar 05 '24 13:03 2color

Do you agree with the broad strokes of this approach?

Yes, sounds good.

achingbrain avatar Mar 05 '24 15:03 achingbrain