jsonlite icon indicating copy to clipboard operation
jsonlite copied to clipboard

toJSON fails to preserve empty names of list items

Open dariomel opened this issue 7 years ago • 12 comments

toJSON fails to preserve empty names of list items

> packageVersion("jsonlite")
[1] ‘1.5’
> a=list("a",b="b")
> names(a)
[1] ""  "b"
> toJSON(a)
{"1":["a"],"b":["b"]}
> names(fromJSON(toJSON(a)))
[1] "1" "b"

In contrast, fromJSON correctly preserves such names,

> names(fromJSON('{"":["a"],"b":["b"]}'))
[1] ""  "b"

dariomel avatar Sep 07 '17 19:09 dariomel

I believe I have a similar issue where toJSON does not allow for duplicate names in an R list. For example:

> a <- list("a"=1, "a"=2)
> toJSON(a)
{"a":[1],"a.1":[2]} 

This appears to be because of the call to the function make.unique() on the last line of the cleannames() function. Would it be possible to have an option to override this?

cstawitz avatar Oct 12 '18 17:10 cstawitz

This is presenting a problem for me as well. The Sheets v4 API requires that I form a request body with multiple instances of a specific schema. That is, if the input for the body has list items with the same name, that is intentional on my part. The make.unique() call is creating a malformed request.

jennybc avatar Nov 23 '19 08:11 jennybc

You mean to have generated duplicate JSON keys? Are you sure that there is no other way?

The JSON standard is not explicit about this, but in most languages the object maps to a dictionary hence the keys must be unique. Validating JSON with duplicated keys on e.g. https://jsonlint.com/ will yield an error.

If you parse a json object with a duplicated key in Javascript, the latter will overwrite the former.

var x = {foo: 123, foo: 456}
console.log(x)
// { foo: 456 }

The JSON way of doing this is using an array of values instead of repeating the key:

var x = {foo: [123, 456]}

For this reason, jsonlite enforces unique keys. If your server really expects json with dupe keys, we could add an option to opt-out of this.

But be aware that json with duplicate keys is not proper json. Most languages won't be able to parse it. Or even worse like in Javascript, it gets parsed but data gets lost.

jeroen avatar Nov 23 '19 13:11 jeroen

OK I went and attempted my task in the Sheets API Explorer and fiddled until it worked and captured the exact curl call. Then used httr::with_verbose() to see what my R implementation was sending out.

Basically I needed to introduce one more layer of list() in order to send multiple instances of the same schema in one batched request. So, yes, I think you are right that I do NOT need toJSON() to generate duplicate keys.

jennybc avatar Nov 23 '19 21:11 jennybc

But I wonder if you should just ?error? for such JSON, instead of quietly using make.names() on it. This definitely made it harder for me to figure out why my request was failing. Maybe the user should be forced to confront this.

jennybc avatar Nov 23 '19 21:11 jennybc

But I wonder if you should just ?error? for such JSON, instead of quietly using make.names() on it.

I understand it would have helped in your example above, but it's not a good idea in general. It would be inconsistent of a serializer to error based on the data content. When input contains supported R types (e.g. a list or data frame), then the serializer (jsonlite) should be able to map this to json.

This is important, because in many cases, the user is working with some higher level application or package with no control over the json part. The user may not even be aware that json is involved under the hood, and in no position to debug such problems.

Authors of packages that use jsonlite should be able to trust that jsonlite will consistently map any of the supported input data types to proper json, and jsonlite will somehow deal with edge cases. In this case there is no intuitive mapping, but the responsibility to deal with this is in jsonlite, not at the user.

jeroen avatar Nov 24 '19 13:11 jeroen

One last comment and then I'll let it go. And I certainly say "no", to my share of similar requests, so that's fine.

But here's my argument for rethinking this. It feels like the same set of issues I worked through when sorting out name repair for tibbles. In the end, we decided to default to .name_repair = "check_unique" (error if my names are not unique). But caller can always request .name_repair = "minimal" (my dupes are OK) or .name_repair = "unique" (please get rid of my dupes).

I think the default state should be to check for duplicate names and, if found, error. As you say duplicate names are generally a mistake / invalid. I would expect that most of us are making JSON to satisfy some sort of explicit or implicit schema and names have actual external meaning. So the chance that randomly altered names (repeatCell, repeatCell.1, like I got) continue to have the intended meaning relative to that schema seem slim. In the rare cases where that's true (or there are names but they actually don't matter), caller can indicate that.

It would be inconsistent of a serializer to error based on the data content.

I think of this quiet, mandatory name repair as a case where the serializer is changing the data content, though.

jennybc avatar Nov 24 '19 16:11 jennybc

Was there a solution to the original question by @dariomel? In other words, is there a way for toJSON() to preserve the empty name in a list. The following conversations seem to distracted into a separate question of deal with duplicated keys.

wei-wu-nyc avatar Jan 02 '20 18:01 wei-wu-nyc

I am trying to deprecate rjson in favor of jsonlite and this is blocking me. rjson respects the duplicate names, jsonlite apparently applies make.names.

MichaelChirico avatar Jan 09 '21 18:01 MichaelChirico

What is your use case for duplicate key names? You know that most json parses won't be able to interpret that?

jeroen avatar Jan 12 '21 15:01 jeroen

Turns out it was the duplicates were unintentional & hitting this surfaced a bug, so don't leave open on account of me; but FWIW, a go script was using the JSON with duplicates, is how the script wasn't broken to begin with.

MichaelChirico avatar Jan 28 '21 09:01 MichaelChirico

This issue started as "empty names" and then was hijacked in a sense (no offense) to talk about duplicate names. While the latter might prove problematic to the receiving end, the former does not.

For the initial issue of "empty names", it is certainly legal in R and python and likely many other parsers for the list to have an empty-string for a key. Changing it is a corruption of the data, and doing so silently and outside of the spec is frustrating.

For the second issue of duplicate names (in case anyone cares), I suggest that it is not the encoder's responsibility to enforce what might be a problem for the receiver. Let the receiver error or not, but let the caller send what it thinks is the correct data. I don't think it's the messenger/encoder's responsibility to assume anything outside of the spec.

jsonlite's behavior here is asymmetric: even going with the presumption that names should not be duplicated, why does jsonlite::fromJSON('{"a":[1],"a":[2]} ') parse without error into a list with duplicate names? I don't think this behavior should change, I'm arguing that the code is enforcing make.names in one place and not another.

Finally, round-trip validation does not work. Discarding details like dataframes and vectors and unboxing, these results are counter-intuitive:

L <- list(1L, a=2L)
jsonlite::toJSON(L)
# {"1":[1],"a":[2]} 
identical(L, jsonlite::fromJSON(jsonlite::toJSON(L)))
# [1] FALSE
identical(L, jsonlite::fromJSON('{"":[1],"a":[2]}'))
# [1] TRUE

In my case, this means that in one portion of my code, I need to wrap calls to toJSON to undo this changing of my data. Fortunately, it's a flat-enough object that gsub will work without chance of getting it wrong, but it would be far better to not have to do this.

r2evans avatar Apr 07 '22 17:04 r2evans