RIOT icon indicating copy to clipboard operation
RIOT copied to clipboard

Align uri_parser's output with CRIs

Open chrysn opened this issue 5 years ago • 7 comments

Description

Work on a format to replace link-format is going on around the CoRAL format, and chances are CRIs are the first part of that that can become stable. CRIs are a CBOR based representation of the information in a URI (largely compatible, and where it's not it's in areas that implementations often don't get right anyway), and especially suitable when CoAP requests are later built from it.

The data structure that uri_parser produces is almost aligned with the CRI information model, and I think it'd be convenient to align them to the point where such a struct can be used as an internal representation of a CRI. Then, CRIs from CoRAL documents could be preprocessed into uri_parser_result, and requests built from them.

The current discrepancies are:

  • host is currently in text form; CRIs have either a text DNS name or a binary representation of the IP literal. Parsing this early would make sense as it makes the later use easier, but might be problematic with the uri_parser_result notion of using the original URI as immutable backing store (and even if mutation is allowed, which might make sense, the text form is shorter than the binary form for some addresses).
    • CRIs don't have a zone identifier; that's OK, and the internal representation would have one.
  • port is numeric in CRI; could be changed easily.
    • CRIs allow omitting the port; as long as we only convert internal representations to CRIs where we know the scheme, that can be handled at conversion time.
  • path and query are delimited by their delimiting characters '/' and '&'. CRIs can contain almost arbitrary texts (including '/' in path components and '&' in query components), which would be percent-escaped in URIs -- but we can choose not to support such URIs at all (the current uri_parser doesn't, as it'd need to percent-decode them for mapping into CoAP) for starters. When converting a CRI, that might need to be taken in mutably to allow replacing the CBOR characters with the agreed-on delimiters. (Long-term, being able to express all CRIs would be nice, especially because we do proxying with it, but let's take this step by step)

Useful links

  • CRI draft: https://tools.ietf.org/html/draft-ietf-core-href-14
  • How it'd be used: CoRAL: https://tools.ietf.org/html/draft-ietf-core-coral-03
  • uri_parser result for comparison: https://riot-os.org/api/structuri__parser__result__t.html
  • uri_parser main use case: Proxying #13790

Next steps

I'd keep this around as a tracking issue while uri_parser is being developed on during proxy development; when actual CRI support is added, it can be closed in that PR.

chrysn avatar Apr 07 '20 10:04 chrysn

Thanks for the pointer to CRIs. Has there been any discussion of using CRIs in CoAP itself? For example, there might be a Proxy-Cri option. Or is this work intended for the payload, like link format?

kb2ma avatar Apr 08 '20 12:04 kb2ma

Has there been any discussion of using CRIs in CoAP itself? For example, there might be a Proxy-Cri option.

It's really the other way 'round: CRIs are largely how CoAP options are formed, so it's a way to use the Uri-* options of a CoAP message other serializations than CoAP options.

As I see it, except when it comes to Uri-Host being encoded as a string rather than a binary IP, CRIs encoded in CoAP options are Proxy-Scheme and the Uri-* options.

Or is this work intended for the payload, like link format?

It is. CoRAL is considered as a practical successor to link-format. (And personally, I'd also use it in several places where SenML was the way to go before, like core-interface's batches).

chrysn avatar Apr 08 '20 12:04 chrysn

Just a brief update (no work active here but CRIs are changing):

  • Hosts are now in list form as well (dot separated segments). Convenient for turning them into DNS requests (no ASCII string parsing any more at all), provided there's a good DNS interface that does not rely on dots being there (which the POSIX getaddrinfo does, but we don't have to replicate that). This doesn't introduce any new fields of work, because we already need some trickery needed to get space for the v6 address.
  • Paths and queries support a lot more now (using the new PET mechanism), but these are inexpressible in CoAP, so we can stick with that and not much changes.

chrysn avatar Dec 18 '21 14:12 chrysn

Before I start API sketching here I'd like to collect what CRI handling for RIOT might be able to do, sketching use cases:

  • URI parsing: Given a buffer that contains a text URI, parse it into something that can be populated into a CoAP request. (The URI may also be a URI reference, in which case the base might be ... a different CoAP request?).

    Example: User input on a console needs to be placed into a CoAP request (coap get coap://host/path)

  • CRI parsing: Same but with already serialized CRIs.

    Example: Resource discovery returns a CRI reference (eg. equivalent to directory/ in a response to a multicast CoAP request received from [fe80::42]:61616)

  • CoAP request handling: A server (as aroud https://github.com/RIOT-OS/RIOT/pull/14397#issuecomment-1100185673) receives a request and needs access to the parts of the request's URI that have not been "used up" by the resource dispatch. (For other, primarily non-CRI, purposes, eg. returning Location-* or full URIs, the handler also needs to access the full requested CRI).

    Example: A server in #14397 is attached to the gCoAP server at /fs/. When a request comes in carrynig the CoAP options for /fs/mtd0/firmware1.bin?sha1sum, the handler would like to have mtd0/firmware1.bin?sha1sum conveniently at hand (where different users may have different ideas of what "convenient" here means; those considering the string "mtd0/firmware1.bin?sha1sum" to be convenient might reconsider when they learn that actual percent signs inside the file name, which are legal in the path, would be percent encoded in such a string).

  • CRI producing: A server has some knowledge about where something is located, and needs to produce a CRI for it.

    Example: The .well-known/core resource needs to produce /fs/ or the CRI binary equivalent thereof from the path names registered in the server, depending on the client's Accept value.

The hard part about unifying these is that they're all present in memory already but in different forms (CoAP option serialization, parts of socket endpoints, CRIs, URIs), and that even if we limited ourselves to the subset of CRIs where all strings are in contiguous memory (which is, essentially, URIs with no percent encoding), the incompatible delimitations mean that a unified CRI that zero-copies will either carry around a rather large list of start and end positions of components (like, 10 x 2 x size_t for up to 5 path components and 5 query components), or needs to behave in a driver-like fashion. (The third alternative is to rewrite the data into a consistent structure, but (a) the original data may be const, and (b) there may not be enough contiguous memory to store the full CRI).

chrysn avatar Apr 15 '22 16:04 chrysn

I'm currently leaning towards starting small and providing a CRIish interface to parsed URIs first; more can then still be done through a driver model.

When processing URIs, this would only accept "easy" URIs, and reliably refuse those it can't process correctly. This would put us in about the same league of features as the current URI parser (no percent encoding etc) but without the conversion errors. Unsupported URIs would not only contain those inexpressible in CoAP (like any that use escaped allowed delimiters) but also those that are expressible in CoAP but hard to translate. Practically speaking, that's probably the relevant subset.

chrysn avatar Sep 22 '22 07:09 chrysn

chrysn: Work on a format to replace link-format is going on around the CoRAL format, and chances are CRIs are the first part of that that can become stable

I just edited your links from version 3 to version 14 of CRIs. Can you give an opinion if it can be considered "stable"? What is the next step here?

Teufelchen1 avatar Mar 12 '24 10:03 Teufelchen1

Teufelchen1 avatar Jun 20 '24 21:06 Teufelchen1