kubo icon indicating copy to clipboard operation
kubo copied to clipboard

feat: serve CBOR encoded DAG nodes from the gateway

Open alanshaw opened this issue 3 years ago • 11 comments

This PR enables the gateway to serve CBOR encoded DAG nodes. They are returned as JSON objects, the same encoding used on the CLI.

Closes https://github.com/ipfs/go-ipfs/issues/8823 and https://github.com/ipfs/in-web-browsers/issues/182

alanshaw avatar Mar 30 '21 09:03 alanshaw

I've come to realise a possible reason for CBOR (or other encoding) not being implemented yet is pathing within a node - typically you can /ipfs/{cid}/internal/path to get the value 138 out of a node like { internal: { path: 138 } } - this PR does not take this into account and I'm not sure how it would work right now if we allow returning raw cbor (I guess re-encode 138 in CBOR?).

alanshaw avatar Apr 07 '21 08:04 alanshaw

@alanshaw iiuc our path traversal semantics, pathing makes sense only when we talk about separate documents, so it should work only for CID tags within dag-cbor. This aspect of this PR lgtm.

Also, pathing makes it impossible to extract more than one value, so I suggested support for graphql-like queries that return only a subset of fields from a document in https://github.com/ipfs/in-web-browsers/issues/182, but that requires more design discussion (needs to be future-proof and compatible with IPLD selectors), lets descope it from this PR.

Remaining work here (so we dont miss anything):

  • informative error when no explicit format is requested
  • add sharness tests
  • make another pass at HTTP headers (ensure we include the same set of headers as for dag-pb)

lidel avatar Apr 07 '21 13:04 lidel

@alanshaw After sleeping on this I think we should not return JSON by default, but instead return an error informing user to retry request with explicit format (JSON or CBOR) either via content-type header or ?enc= param.

That ensures devs will build on deterministic response types and enable us to introduce user-friendly UI for exploring DAGs in the future, as suggested in https://github.com/ipfs/in-web-browsers/issues/182#issuecomment-815039662

lidel avatar Apr 08 '21 13:04 lidel

I think we should not return JSON by default, but instead return an error informing user to retry request with explicit format (JSON or CBOR) either via content-type header or ?enc= param.

That ensures devs will build on deterministic response types and enable us to introduce user-friendly UI for exploring DAGs in the future

Big agree, :100: . I think we should leave room for that kind of experience to be built into the Gateway. It would be consistent with what we do for unixfsv1 directories already and we should leave room to carry that design logic further; and for the encoding types, explicit is good.

warpfork avatar Apr 08 '21 16:04 warpfork

Let me join the chorus suggesting against converting objects to JSON by default.

Despite it being almost the only case right now, unixfs is a special case of IPLD data that can be interpreted by a gateway. When a client asks the gateway for some CID and that CID is a unixfs root, the gateway doesn't give the client what they actually asked for (an IPLD node), but infers a request for the DAG data to be traversed and rendered to a flat file (which does not have the same hash). Or analogous behaviour for a directory. This is great for the use case of URLs in hyperlinks, rendering in browsers etc. We don't ask the client to pass headers or query parameters for this "special" rendering behaviour because it's too awkward for the browser URL bar case; so the default is to render.

Imagine some node that the gateway currently doesn't know how to render, but in the future might be told how to render as something a browser can handle. We should preserve a similar default approach of assuming that the request is from a browser URL bar. Maybe some third party has an alternative schema for chunking flat files, and we later add support for it too. We don't want to change the response data and break old clients relying on the default JSON.

Most other contexts outside a browser URL bar have easy capacity to pass parameters describing what content they want (JS code in browsers, server-side code, curl invocations, ...). It seems a bit backwards that "give me the raw data with this CID" is not the no-parameter default, but the browser case determines that. So IMO all the other response types should be explicit, controlled by parameters from the client. Maybe we even need an ?enc= (blank value) to request no transformation, in case the caller doesn't know (or want to work out from CID) what the native encoding is.

When serving IPLD data, another reason not to JSONify everything is that it breaks the verifiability of the content from the CID. It's ok for that to be an explicit request, but unfortunate as a default behaviour. JS code can parse CBOR too.

IMHO we should be serving data from the gateway that is by default the easiest format to consume from a web application.

I would tweak this. We should be serving data from the gateway that is by default the easiest format to consume from a web browser, because the URL bar is the least good place to require parameters describing what the client wants, and is already set by precedent. But web applications are just as capable as other clients of requesting a particular data format – the request and response will both usually be handled by code.

anorth avatar Apr 27 '21 01:04 anorth

Switched to draft while this is still undergoing some design discussion and isn't quite ready for mergable-reviewing.

aschmahmann avatar May 10 '21 15:05 aschmahmann

This might become much easier once we have https://github.com/ipfs/go-ipfs/pull/7995 which will allow for a dag get to use arbitrary ipld-prime codec encodings. cc @aschmahmann for recommendation on if this implies the cmds route for encoding vs as enumerated encoders on the specific subcommands

willscott avatar May 24 '21 21:05 willscott

Quick check-in note:

  • this needs to be refactored to use ?format= proposed in https://github.com/ipfs/go-ipfs/issues/8234
    • may be easier to tackle block car cbor and json use cases in a single PR to ensure DRY code around HTTP headers (either reuse this one, or open a new one)

lidel avatar Jul 02 '21 12:07 lidel

Note to self: after https://github.com/ipfs/go-ipfs/pull/8758 lands we revisit this and add dag-cbor and dag-json formats using the newly created reusable code for cache control and content disposition.

lidel avatar Mar 03 '22 02:03 lidel

@lidel : given https://github.com/ipfs/go-ipfs/pull/8758 has landed, I assume this is unblocked? I'll put this down for go-ipfs 0.14 or are you thinking to do sooner or suggest someone else in the community take it?

BigLep avatar Mar 23 '22 17:03 BigLep

@BigLep yes, this is unblocked, I've created https://github.com/ipfs/go-ipfs/issues/8823 with some additional notes, so it is not blocked on my availability.

Aiming at 0.14 sounds good: ideally we would resolve https://github.com/ipfs/go-ipfs/pull/8568 before this ships, as people will start using dag-json and dag-cbor way more thanks to this, and we want good UX around it, but in this specific case it is not a hard blocker.

Implementation wise, it should be pretty easy to add on top of refactor that happened in https://github.com/ipfs/go-ipfs/pull/8758:

https://github.com/ipfs/go-ipfs/blob/5b43672a8733525ad9fab9be03d14d983f3f9914/core/corehttp/gateway_handler.go#L307-L326

Anyone with spare time could do it, given a spare day. Writing tests will take most of the time here, but we could reuse test fixtures from:

  • https://ipld.io/specs/codecs/dag-pb/fixtures/cross-codec/
  • https://ipld.io/specs/codecs/dag-json/fixtures/cross-codec/
  • https://ipld.io/specs/codecs/dag-cbor/fixtures/cross-codec/

(at minimum, we want to test the real dag-json and dag-cbor + have one for dag-pb represented as dag-json)

lidel avatar Mar 24 '22 20:03 lidel

Continued in https://github.com/ipfs/kubo/pull/9335

lidel avatar Oct 11 '22 18:10 lidel