kubo
kubo copied to clipboard
feat: serve CBOR encoded DAG nodes from the gateway
This PR enables the gateway to serve CBOR encoded DAG nodes. They are returned as JSON objects, the same encoding used on the CLI.
Closes https://github.com/ipfs/go-ipfs/issues/8823 and https://github.com/ipfs/in-web-browsers/issues/182
I've come to realise a possible reason for CBOR (or other encoding) not being implemented yet is pathing within a node - typically you can /ipfs/{cid}/internal/path
to get the value 138
out of a node like { internal: { path: 138 } }
- this PR does not take this into account and I'm not sure how it would work right now if we allow returning raw cbor (I guess re-encode 138
in CBOR?).
@alanshaw iiuc our path traversal semantics, pathing makes sense only when we talk about separate documents, so it should work only for CID tags within dag-cbor
. This aspect of this PR lgtm.
Also, pathing makes it impossible to extract more than one value, so I suggested support for graphql-like queries that return only a subset of fields from a document in https://github.com/ipfs/in-web-browsers/issues/182, but that requires more design discussion (needs to be future-proof and compatible with IPLD selectors), lets descope it from this PR.
Remaining work here (so we dont miss anything):
- informative error when no explicit format is requested
- add sharness tests
- make another pass at HTTP headers (ensure we include the same set of headers as for dag-pb)
@alanshaw After sleeping on this I think we should not return JSON by default, but instead return an error informing user to retry request with explicit format (JSON or CBOR) either via content-type
header or ?enc=
param.
That ensures devs will build on deterministic response types and enable us to introduce user-friendly UI for exploring DAGs in the future, as suggested in https://github.com/ipfs/in-web-browsers/issues/182#issuecomment-815039662
I think we should not return JSON by default, but instead return an error informing user to retry request with explicit format (JSON or CBOR) either via content-type header or ?enc= param.
That ensures devs will build on deterministic response types and enable us to introduce user-friendly UI for exploring DAGs in the future
Big agree, :100: . I think we should leave room for that kind of experience to be built into the Gateway. It would be consistent with what we do for unixfsv1 directories already and we should leave room to carry that design logic further; and for the encoding types, explicit is good.
Let me join the chorus suggesting against converting objects to JSON by default.
Despite it being almost the only case right now, unixfs is a special case of IPLD data that can be interpreted by a gateway. When a client asks the gateway for some CID and that CID is a unixfs root, the gateway doesn't give the client what they actually asked for (an IPLD node), but infers a request for the DAG data to be traversed and rendered to a flat file (which does not have the same hash). Or analogous behaviour for a directory. This is great for the use case of URLs in hyperlinks, rendering in browsers etc. We don't ask the client to pass headers or query parameters for this "special" rendering behaviour because it's too awkward for the browser URL bar case; so the default is to render.
Imagine some node that the gateway currently doesn't know how to render, but in the future might be told how to render as something a browser can handle. We should preserve a similar default approach of assuming that the request is from a browser URL bar. Maybe some third party has an alternative schema for chunking flat files, and we later add support for it too. We don't want to change the response data and break old clients relying on the default JSON.
Most other contexts outside a browser URL bar have easy capacity to pass parameters describing what content they want (JS code in browsers, server-side code, curl
invocations, ...). It seems a bit backwards that "give me the raw data with this CID" is not the no-parameter default, but the browser case determines that. So IMO all the other response types should be explicit, controlled by parameters from the client. Maybe we even need an ?enc=
(blank value) to request no transformation, in case the caller doesn't know (or want to work out from CID) what the native encoding is.
When serving IPLD data, another reason not to JSONify everything is that it breaks the verifiability of the content from the CID. It's ok for that to be an explicit request, but unfortunate as a default behaviour. JS code can parse CBOR too.
IMHO we should be serving data from the gateway that is by default the easiest format to consume from a web application.
I would tweak this. We should be serving data from the gateway that is by default the easiest format to consume from a web browser, because the URL bar is the least good place to require parameters describing what the client wants, and is already set by precedent. But web applications are just as capable as other clients of requesting a particular data format – the request and response will both usually be handled by code.
Switched to draft while this is still undergoing some design discussion and isn't quite ready for mergable-reviewing.
This might become much easier once we have https://github.com/ipfs/go-ipfs/pull/7995 which will allow for a dag get
to use arbitrary ipld-prime codec encodings. cc @aschmahmann for recommendation on if this implies the cmds route for encoding vs as enumerated encoders on the specific subcommands
Quick check-in note:
- this needs to be refactored to use
?format=
proposed in https://github.com/ipfs/go-ipfs/issues/8234- may be easier to tackle
block
car
cbor
andjson
use cases in a single PR to ensure DRY code around HTTP headers (either reuse this one, or open a new one)
- may be easier to tackle
Note to self: after https://github.com/ipfs/go-ipfs/pull/8758 lands we revisit this and add dag-cbor and dag-json formats using the newly created reusable code for cache control and content disposition.
@lidel : given https://github.com/ipfs/go-ipfs/pull/8758 has landed, I assume this is unblocked? I'll put this down for go-ipfs 0.14 or are you thinking to do sooner or suggest someone else in the community take it?
@BigLep yes, this is unblocked, I've created https://github.com/ipfs/go-ipfs/issues/8823 with some additional notes, so it is not blocked on my availability.
Aiming at 0.14 sounds good: ideally we would resolve https://github.com/ipfs/go-ipfs/pull/8568 before this ships, as people will start using dag-json and dag-cbor way more thanks to this, and we want good UX around it, but in this specific case it is not a hard blocker.
Implementation wise, it should be pretty easy to add on top of refactor that happened in https://github.com/ipfs/go-ipfs/pull/8758:
https://github.com/ipfs/go-ipfs/blob/5b43672a8733525ad9fab9be03d14d983f3f9914/core/corehttp/gateway_handler.go#L307-L326
Anyone with spare time could do it, given a spare day. Writing tests will take most of the time here, but we could reuse test fixtures from:
- https://ipld.io/specs/codecs/dag-pb/fixtures/cross-codec/
- https://ipld.io/specs/codecs/dag-json/fixtures/cross-codec/
- https://ipld.io/specs/codecs/dag-cbor/fixtures/cross-codec/
(at minimum, we want to test the real dag-json and dag-cbor + have one for dag-pb represented as dag-json)
Continued in https://github.com/ipfs/kubo/pull/9335