braid-spec icon indicating copy to clipboard operation
braid-spec copied to clipboard

Rename Content-{Length,Type,Range} → Patch-{Length,Type,Range}

Open toomim opened this issue 3 years ago • 13 comments

The headers we specify on patches currently have the Content- prefix:

Request:

   PUT /chat
   Content-Type: application/json
   Patches: 2

   Content-Length: 53                         <-- Content-Length
   Content-Range: json .messages[1:1]         <-- Content-Range

   [{"text": "Yo!",
     "author": {"link": "/user/yobot"}]

   Content-Length: 326                        <-- Content-Length
   Content-Type: application/json-patch+json  <-- Content-Type

   [
     { "op": "test", "path": "/a/b/c", "value": "foo" },
     { "op": "remove", "path": "/a/b/c" },
     { "op": "add", "path": "/a/b/c", "value": [] },
     { "op": "replace", "path": "/a/b/c", "value": 42 },
     { "op": "move", "from": "/a/b", "path": "/a/d },
     { "op": "copy", "from": "/a/d/d", "path": "/a/d/e" }
   ]

We borrowed these from HTTP's existing header names. But upon further reflection, the semantics of "content" might not be totally appropriate.

In HTTP, the term "content" refers to the contents of the resource. (See rfc7231 "HTTP/1.1 Semantics and Content".) However, we are using these terms to refer to a patch. Thus, they are not about the same "content" anymore.

I think it might be clearer if we renamed them to Patch-Length, Patch-Range, and Patch-Type, or perhaps even just Length, Range, and Type. Here is what it would look like with the former:

Request:

   PUT /chat
   Content-Type: application/json
   Patches: 2

   Patch-Length: 53                         <-- Patch-Length
   Patch-Range: json .messages[1:1]         <-- Patch-Range

   [{"text": "Yo!",
     "author": {"link": "/user/yobot"}]

   Patch-Length: 326                        <-- Patch-Length
   Patch-Type: application/json-patch+json  <-- Patch-Type

   [
     { "op": "test", "path": "/a/b/c", "value": "foo" },
     { "op": "remove", "path": "/a/b/c" },
     { "op": "add", "path": "/a/b/c", "value": [] },
     { "op": "replace", "path": "/a/b/c", "value": 42 },
     { "op": "move", "from": "/a/b", "path": "/a/d },
     { "op": "copy", "from": "/a/d/d", "path": "/a/d/e" }
   ]

And here's an example of the latter:

Request:

   PUT /chat
   Content-Type: application/json
   Patches: 2

   Length: 53                         <-- Length
   Range: json .messages[1:1]         <-- Range

   [{"text": "Yo!",
     "author": {"link": "/user/yobot"}]

   Length: 326                        <-- Length
   Type: application/json-patch+json  <-- Type

   [
     { "op": "test", "path": "/a/b/c", "value": "foo" },
     { "op": "remove", "path": "/a/b/c" },
     { "op": "add", "path": "/a/b/c", "value": [] },
     { "op": "replace", "path": "/a/b/c", "value": 42 },
     { "op": "move", "from": "/a/b", "path": "/a/d },
     { "op": "copy", "from": "/a/d/d", "path": "/a/d/e" }
   ]

toomim avatar Feb 25 '21 00:02 toomim

The Patch-* option could help address Issue https://github.com/braid-org/braid-spec/issues/96: "Can we remove Patches: 1 when there's only one patch?"

toomim avatar Feb 25 '21 00:02 toomim

Yes! I like this change. It also has the (perhaps unintended) consequence of making parsing more robust: If there is a mistake made in the number of patches, for example (Patches: 4) but then 5 are sent over the wire, we can detect this as an error state.

canadaduane avatar Feb 25 '21 18:02 canadaduane

Oh, since Patch-Length is a REQUIRED header for a patch, it also simplifies the parser! An implementation's parser need only alternate between two states (headers/body) during parsing instead of 4 states (version headers / version body / patch headers / patch body). The question of "is this a patch header or a version header?" could be handled at a higher level.

Edit: Nevermind, we still need to go into the actually length of the Patch, because \r\n inside a patch would mess up the parser.

canadaduane avatar Feb 25 '21 18:02 canadaduane

Thinking about the names a bit more:

  1. Patch-Length could perhaps be misconstrued to mean "the length of the content in the resource being patched". This would be incorrect, of course, because it's the Patch-Range that specifies the start-point and end-point (range) from which length can be calculated. Using your reference to rfc7231 semantics ("In HTTP, the term "content" refers to the contents of the resource"), this issue goes away because we're clearly talking about the length of the patch.
  2. But that leads me to Patch-Range. I wonder if we should keep it as "Content-Range", by using the same logic? It is the range of the content (at the resource) which the patch is targeting, not a range on the content of the patch itself.

So if we'd like to keep the semantics clear, I think we should have:

Patch-Length (Required)
Patch-Type (Optional)
Content-Range (Optional)

Changing your example:

Request:

   PUT /chat
   Content-Type: application/json
   Patches: 2

   Patch-Length: 53                         <-- Patch-Length
   Content-Range: json .messages[1:1]       <-- Content-Range

   [{"text": "Yo!",
     "author": {"link": "/user/yobot"}]

   Patch-Length: 326                        <-- Patch-Length
   Patch-Type: application/json-patch+json  <-- Patch-Type

   [
     { "op": "test", "path": "/a/b/c", "value": "foo" },
     { "op": "remove", "path": "/a/b/c" },
     { "op": "add", "path": "/a/b/c", "value": [] },
     { "op": "replace", "path": "/a/b/c", "value": 42 },
     { "op": "move", "from": "/a/b", "path": "/a/d },
     { "op": "copy", "from": "/a/d/d", "path": "/a/d/e" }
   ]

canadaduane avatar Feb 25 '21 20:02 canadaduane

Yeah I think I like that idea. A lot of the examples in the spec at the moment have content-type: application/json and then send a patch which might be encoded in some other way - which seems confusing. The logic for "does this response replace the document or does it modify it" would be a bit simpler.

The thing I like the most about this idea is that its clear about the separation between parts of the response. Right now as I mentioned in #96 there are 3 levels of HTTP headers in subscriptions:

TTP/1.1 209 Subscription
content-type: text/plain
transfer-encoding: chunked # < --- 1. real http headers

content-type: application/json # <--- 2. version/update level headers
version: "abc"
patches: 1

patch-length: xxx # <-- 3. patch level headers
patch-range: yyy

patch contents

I feel like the current spec is a little ambiguous about which headers go where, and when. With this change Patch-Length: is describing the number of bytes in a patch. Version: names the resulting version after a transaction has been applied. And so on.

josephg avatar Feb 26 '21 01:02 josephg

This change seems to have strong support (as it is mentioned as a dependency in other issues as well), and is probably ready for a PR to be drafted.

However, @canadaduane notes that we might want to keep Content-Range instead of Patch-Range:, because the range is still referring to a range of content. So I think we have two options:

Option 1: keep Content-Range

   Patch-Length: 53                         <-- Patch-Length
   Content-Range: json .messages[1:1]       <-- Content-Range

   [{"text": "Yo!",
     "author": {"link": "/user/yobot"}]

   Patch-Length: 326                        <-- Patch-Length
   Patch-Type: application/json-patch+json  <-- Patch-Type

   [
     { "op": "test", "path": "/a/b/c", "value": "foo" },
     { "op": "remove", "path": "/a/b/c" },
     { "op": "add", "path": "/a/b/c", "value": [] },
     { "op": "replace", "path": "/a/b/c", "value": 42 },
     { "op": "move", "from": "/a/b", "path": "/a/d },
     { "op": "copy", "from": "/a/d/d", "path": "/a/d/e" }
   ]

Option 2: rename to Patch-Range

   Patch-Length: 53                         <-- Patch-Length
   Patch-Range: json .messages[1:1]         <-- Patch-Range

   [{"text": "Yo!",
     "author": {"link": "/user/yobot"}]

   Patch-Length: 326                        <-- Patch-Length
   Patch-Type: application/json-patch+json  <-- Patch-Type

   [
     { "op": "test", "path": "/a/b/c", "value": "foo" },
     { "op": "remove", "path": "/a/b/c" },
     { "op": "add", "path": "/a/b/c", "value": [] },
     { "op": "replace", "path": "/a/b/c", "value": 42 },
     { "op": "move", "from": "/a/b", "path": "/a/d },
     { "op": "copy", "from": "/a/d/d", "path": "/a/d/e" }
   ]

That's for responses. What about Requests?

I am also noticing that we have only been examining responses, here, and we should consider if this change will be symmetric for requests. Requests have always used the Range: header, instead of Content-Range: of responses. So again, we have two options: (1) preserve the Range: header, or (2) rename to Patch-Range:.

Option 1: Range

         PUT /chat
         Version: "g09ur8z74r"                              | Update
         Parents: "ej4lhb9z78"                              |
         Content-Length: 189                                |
         Content-Type: application/json                     |
         Merge-Type: sync9                                  |
         Patches: 2                                         |
                                                            |
         Patch-Length: 53                                   | | Patch
         Range: json=.messages[1:1]                         | |
                                                            | |
         [{"text": "Yo!",                                   | |
           "author": {"link": "/user/yobot"}]               | |
                                                            |
         Patch-Length: 40                                   | | Patch
         Range: json=.latest_change                         | |
                                                            | |
         {"type": "date", "value": 1573952202370}           | |

Option 2: Patch-Range

         PUT /chat
         Version: "g09ur8z74r"                              | Update
         Parents: "ej4lhb9z78"                              |
         Content-Length: 189                                |
         Content-Type: application/json                     |
         Merge-Type: sync9                                  |
         Patches: 2                                         |
                                                            |
         Patch-Length: 53                                   | | Patch
         Patch-Range: json=.messages[1:1]                   | |
                                                            | |
         [{"text": "Yo!",                                   | |
           "author": {"link": "/user/yobot"}]               | |
                                                            |
         Patch-Length: 40                                   | | Patch
         Patch-Range: json=.latest_change                   | |
                                                            | |
         {"type": "date", "value": 1573952202370}           | |

Consider also that a Range-PUT request without the Patches header is allowed in today's HTTP, (called a Partial PUT in HTTP rfc9110) and I presume looks like this:

         PUT /chat
         Content-Type: application/json
         Content-Length: 189
         Range: json=.messages[1:1]

         [{"text": "Yo!",
           "author": {"link": "/user/yobot"}]

But if we express that with the Braid Patches: N header, and separate each patch out, then our question of whether to use Range: or Patch-Range: comes up again:

         PUT /chat
         Content-Type: application/json
         Patches: 1

         Patch-Length: 189
         Range: json=.messages[1:1]          <-- or Patch-Range:?

         [{"text": "Yo!",
           "author": {"link": "/user/yobot"}]

It might be desirable to always use Range: header in requests, rather than switching to Patch-Range in these new situations where we include a Patches: N header.

If I remove the chair hat, I am personally leaning slightly in favor of preserving the existing names for Range, and only transitioning:

  • Content-{Length,Type} → Patch-{Length,Type}

instead of:

  • Content-{Length,Type,Range} → Patch-{Length,Type,Range}.

But it would help to hear from other people on this.

toomim avatar Aug 19 '23 08:08 toomim

Perhaps we should petition HTTPbis to change the definition of Partial PUT, and require a Patch-Length: instead of Content-Length: header. Two reasons:

  • ~~This is not the length of the content, and should not be misconstrued as such.~~
  • It might help legacy servers fail when receiving a Partial PUT, rather than blithely obliterating accepting the range as a replacement for the entire contents of the resource.

toomim avatar Aug 19 '23 08:08 toomim

Oh... the most recent HTTP RFC 9110 has updated the definition of "content". I'm not sure my past interpretations of "content" are correct anymore. This deserves re-evaluation before making a choice.

toomim avatar Aug 19 '23 08:08 toomim

~~Here's a quick draft implementation: https://github.com/braid-org/braid-spec/pull/114/commits/d250ecd1640dfa964c813a798121f6fd4bf0a238~~

Edit: I made a better PR 120. See below.

toomim avatar Aug 19 '23 08:08 toomim

FYI, with Range GET request, Content-Length corresponds to the subset of whole content, the amount which is send in the response, not the size of the original content on the server over which you are doing a range: https://httpwg.org/specs/rfc7233.html#rfc.section.4.1

See example response there:

Content-Range: bytes 21010-47021/47022
Content-Length: 26012

The size of the content on the server is 47022, but the "patch" size is 26012.

mitar avatar Aug 19 '23 17:08 mitar

Yes, mitar is right. It turns out that "Content" actually refers to the content of the HTTP message, not the resource on the server. (It's a coincidence that "the content of a message" often happens to be "the thing on the server.")

Also, it turns out that the word "Content" in "Content-Length" is not to be taken seriously according to rfc9110:

Note: Some field names have a "Content-" prefix. This is an informal convention; while some of these fields refer to the content of the message, as defined above, others are scoped to the selected representation (Section 3.2). See the individual field's definition to disambiguate.

The definition for Content-Length explains that it sometimes refers to the size of the resource on the server, but in other cases is just used for message framing:

When transferring a representation as content, Content-Length refers specifically to the amount of data enclosed so that it can be used to delimit framing (e.g., Section 6.2 of [HTTP/1.1]). In other cases, Content-Length indicates the selected representation's current length, which can be used by recipients to estimate transfer time or to compare with previously stored representations.

toomim avatar Aug 20 '23 22:08 toomim

I spent quite a few hours thinking through all the different naming options. It turns out to be fairly complicated, because these headers can appear at the top level, or within a block of patches.

  • At the top-level, we want to be compatible with existing HTTP
  • Within a block of patches, we have an opportunity to define new and better header names
  • But it would be nice to make top-level header names consistent, if not equivalent, to the patches-level names

I tried to strike a balance here, and came up with the following PR: https://github.com/braid-org/braid-spec/pull/120

This gives us:

  • Preservation of existing HTTP semantics and existing HTTP header names
  • A simple way to convert into Patches: N notation for additional expressive power
  • A solution to the issue of Partial PUTs on legacy servers

This probably is not the end of the discussion, but maybe it's enough for version 03 of the draft. Thoughts?

toomim avatar Aug 21 '23 02:08 toomim

I just discovered that the HTTP WG is noticing problems with the ambiguity of Content-Length, and has been considering designing a new header to fix them. Here's a draft proposal: https://datatracker.ietf.org/doc/html/draft-nottingham-bikeshed-length

This indicates might be support in solving the Content-Length issue deeply within HTTP. Perhaps we want to borrow that deeper solution, or contribute to it, to solve our problem in message framing as well.

Also, HTTP/2 has its own message framing system. Headers and are transmitted in a "headers frame", and content in a "data frame", that says how long it is in binary, and thus doesn't need a content-length or patch-length field to frame it.

toomim avatar Aug 21 '23 07:08 toomim