url Allow encoding of null values in `application/x-www-form-urlencoded`.

https://github.com/whatwg/url/blob/fdaa0e5a3790693a82f578d7373f216d8fef9ac8/url.bs#L2933-L2934

Some systems want to encode null values.

e.g.

parse("x=10&y") => {x => "10", y => nil}

differently from empty strings, e.g.

parse("x=10&y=") => {x => "10", y => ""}

I wonder if we can amend this rule to allow optional special interpretation of keys without = separators to indicate that the value is null.

Consider adding the wording:

"empty byte sequence or null value depending on what is most appropriate for the language environment to represent the lack of a value".

Or something to that effect.

See https://github.com/ruby-grape/grape/issues/2298 and https://github.com/rack/rack/issues/1696 for more details/discussion over the past 2 years.

Jan 04 '23 04:01 ioquatix

After "searching the literature" I found a similar issue here: https://github.com/whatwg/url/issues/427

Jan 04 '23 05:01 ioquatix

That this is about application/x-www-form-urlencoded is not immediately clear.

This was discussed in #469 and I stand by https://github.com/whatwg/url/issues/469#issuecomment-627212141. If you want null, omit the key.

I can appreciate that it's difficult to migrate existing software though. But also, in theory there's nothing stopping servers from interpreting a URL's query however they wish. They just can't claim to support application/x-www-form-urlencoded then.

Jan 04 '23 08:01 annevk

To a certain extent, I think your comment is fair, but where that breaks down is:

Encoding arrays with embedded null values. e.g. encode({x: [1, null, 2]}) => x[]=1&x[]&x[]=2
Explicitly specifying something is null: perhaps the remote end assumes a default value unless otherwise specified, and if you can't explicitly specify null, the default is assumed (so the absence of a key doesn't imply null).
Round-tripping the exact data structure. It's impossible to serialise {x: null} and deserialise the same data on the remote end.

With the proposed changes, it's possible to support these use cases. Otherwise, for most other cases that I'm aware of, simply omitting the value is sufficient.

I made this issue because when we standardised Rack (v3) to follow the current application/x-www-form-urlencoded standard, downstream systems were broken. So, to your point - "migrate existing software" can be tricky.

Either way, I don't personally have a strong opinion about it, but it would be convenient and makes the application/x-www-form-urlencoded format more useful and complete. Either way, having a specific documented stance on allowing or disallowing this would help guide users of URL query parsers what they should and shouldn't expect, and where they might like to reach for something more elaborate (e.g. JSON). For example, the alternative of this proposal is to explicitly call out treating query values like x itself as null, as invalid.

I would like to know, before this specification, was application/x-www-form-urlencoded specified anywhere? It does seem to me that some languages are adopting x itself as a defacto representation of {x: null}. Some research from 2 years ago: https://github.com/rack/rack/issues/1696#issuecomment-661699610

Jan 04 '23 10:01 ioquatix

The key is that the standard only has a single model: a string-to-string(s) multimap. Each entry in this multimap had a string key, and its value is one or more strings.

This model does not support:

The value being null
The value being a list containing null
The value being a zero-sized list
The value being a string, instead of a list of strings
Any special treatment of keys that end in "[]"
Or anything else, like the value being a number, or boolean, or whatever.

In particular, this model is not by itself capable of representing string-keyed maps to arbitrary values in your target language. The value space for each entry is much more limited.

Now. You can build more complicated models on top of the standard's model! You should just do so in a layered fashion, and do so while acknowledging you are going beyond any shared web standard and thus are going to need to do a lot of outreach and consensus building among all consumers/producers you want to interoperate with. In particular, browsers have no need for a more complicated model, and so won't participate. It's easiest if you constrain your model to just one client/server pair, but maybe you want it to expand to "all client/server pairs using a specific library", or even set of collaborating libraries.

Examples of ways you could layer on top of the spec's model include:

Treating entries with multiple values for the same key as an error, unless the key ends in "[]", in which case you strip the "[]" suffix. (This one seems common in non-browser libraries. Although I haven't checked their handling of foo=bar&foo=baz; maybe they discard the bar, or discard the baz, or treat it as a list anyway even without the key ending in "[]"?)
Treating entries with a single value as a string, instead of a list containing a single string, unless the key ends with "[]". (Also seems common.)
Treating entries with a single value "null" as your language's null value, and requiring the string "null" to be encoded some other way.
Treating entries with a single value empty string as your language's null value, and requiring the empty string to be encoded some other way.
Requiring all values to be JSON-encoded, so you can represent arrays, nulls, booleans, numbers, strings, objects...
Using suffixes on the keys to denote data types, so e.g. transforming ("foo!bool", ["false"]) into ("foo", false).

Etc. The main idea is that you have a limited space to work with, when you layer on top of the standard's model. So it'll take some work to get people to agree.

Jan 04 '23 11:01 domenic

@domenic I'm personally fine with that model, but the reality is a fairly predominant pattern of people interpreting x itself as {x: null} and as a maintainer of a popular library, which previously followed that model, and now follows the "a string-to-string(s) multimap" with extensions (layers as you suggest for handling arrays, etc), we have now introduced understandable pain.

Since I think we prefer to follow the standard, my only hope for those users was to propose some kind of change to the standard.

From my point of view, introducing layered interpretations doesn't help that much when everyone has their own bespoke concept of what a layer should be like - the entire point of standards is to standardise the approach so we don't end up with people building N different ways of representing null values etc. Again, I don't have a strong opinion about it, but this is the pain point and I'm just bringing it up and starting the discussion on behalf of the affected users/developers.

Jan 04 '23 11:01 ioquatix

Well, there's already a layer you're imposing on top---whatever layer is doing the "[]" stuff, and is translating lists-of-single-string-values into just string values. That isn't in this standard. My suggestion is to work with whatever community is responsible for that layer, to extend it to support additional semantics that you'd like. That commmunity isn't really related to this repo though, so here is probably not the right place. This repo is fully about the model I outlined above, since that model is the one browsers use.

Jan 04 '23 12:01 domenic

Fair enough. I think what people are asking for is some level of standardisation of some of those layers. Otherwise as a maintainer of a shared implementation, we can't really introduce bespoke layering without being opinionated (and as you said, not following a standard).

Is there any case where browsers actually do need to specify null values? e.g. <input type="hidden" name="x"> followed by x.value = null?

Jan 04 '23 12:01 ioquatix

No, x.value = null is the same as x.value = "null" for browsers.

Jan 04 '23 12:01 domenic

I mean, if servers got together and agree on additional semantics I could see compatible extensions to URLSearchParams. Web developers could benefit from typed data as well. (Similar to how we're considering extensions to Headers for typed header values.)

I haven't seen sufficient interest in that though and I suspect most existing usage is fairly entrenched and unable to change. And yeah, given that browsers would not be the primary stakeholders but more a beneficiary this doesn't seem like the right place to drive that, but I'd be open to it if we got a group of people together that's somewhat representative of that space.

Jan 04 '23 12:01 annevk

I think the original semantics (x itself) I described are implemented in a number of different frameworks/libraries/languages already, but whether that's enough to constitute a standard, I don't know. Does any browser generate x instead of x= for an empty value?

Jan 04 '23 12:01 ioquatix

Thanks @ioquatix for bringing this up from Grape and @domenic and @annevk for your comments!

Y'all make a ton of sense, even though I think you are too focused on browsers, which are like people, a bit unpredictable at times. In contrast, I think most API developers that were not forced into using application/x-www-form-urlencoded by a form filled in a browser abandoned it in favor of application/json for lack of clarity for things like nulls in URLs and different server-side interpretations. With the change introduced by Rack strictly interpreting the standard they should not be relying on query string parameters either. We can stay true to the spec.

My conclusion is that it was nice when the implementation allowed nulls in a non-ambiguous way In Grape! What @ioquatix was suggesting is to restore that, but I agree that we're extending the spec by doing so. We plan to cleanly deprecate that behavior, tell our users that it's now undefined, and call it a day.

Jan 05 '23 12:01 dblock

No, x.value = null is the same as x.value = "null" for browsers.

Is this why some systems get confused when someone actually has a name "null"? i.e. no. way to differentiate between "null" and null. My guess is there is code that writes if params[name] == "null" return null. Maybe a motivation towards fixing this issue. Do HTML forms ever do this?

Some funny examples:

Jan 05 '23 23:01 ioquatix

I found a lot of NULLs in URLs, https://www.wired.com/2015/11/null/ 🤷

Jan 05 '23 23:01 dblock

If we treated pairs with no key-value delimiter as having a null value, I think we would also have to accept that in the URL http://example/foo&&&baz&&another, the key-value pairs are:

key: "foo", value: null
key: "", value: null
key: "", value: null
key: "baz", value: null
key: "", value: null
key: "another", value: null

In other words, we would not be able to skip strings of empty pair delimiters (&&&&). AFAICT, that skipping is widely agreed upon by other URL/querystring libraries, so this would be a significant departure.

To illustrate why we would need to do that:

I have an API which allows identifying each key-value pair by its position (i.e. an index) and supports index-based operations such as inserting pairs at a particular location, replacing a region of pairs, removing a particular pair, or changing the key/value of a particular pair.

Now consider the URL http://example/?foo&baz&another#frag. The API allows replacing the key component of the first pair:

url.queryParams.replaceKey(at: 0, with: "new_key")
// result: http://example/?new_key&baz&another#frag
//                         ^^^^^^^

But if you replace the key with the empty string, something interesting happens:

url.queryParams.replaceKey(at: 0, with: "")
// result: http://example/?=&baz&another#frag
//                         ^

Since empty strings of delimiters are usually skipped entirely, we must insert an = sign in order to preserve the fact that the first pair exists and its key is the empty string.

Currently this is fine, because the presence or not of the = sign does not change the value component. If we started to say that the presence of that delimiter is meaningful, then inserting this = would also change the pair's value. The only way around it would be allow the following result:

url.queryParams.replaceKey(at: 0, with: "")
// result: http://example/?&baz&another#frag
//                         ^

And to say that this initial & is actually a pair with (key: "", value: null).

I'm not opposed to that (in fact, the &&&-skipping can be problematic in lots of cases), but I think the two changes would need to happen together, and I think this aspect is likely to cause even more compatibility issues.

Jan 28 '23 22:01 karwa

Does any framework in existence actually do that?

Jan 29 '23 05:01 ioquatix

Do what? Interpret a plain & as (key: "", value: null)?

Not that I know of - that's why I said it would be a significant departure, but it is also a logical consequence of saying that a missing key-value delimiter means a null value.

Jan 29 '23 06:01 karwa

While I understand order of elements has semantic meaning, the degenerate case of && seems unimportant to me, and my question was what if any frameworks depend on those semantics.

Jan 29 '23 07:01 ioquatix