jqr icon indicating copy to clipboard operation
jqr copied to clipboard

jq return type

Open cboettig opened this issue 6 years ago • 10 comments

It looks like the return type of a jq query is a jqson object, which is essentially a string.

I think it would be way more useful for jq to return a proper JSON object (or list from parsing the JSON, but better to leave the user in control of the simplify* options of fromJSON`).

Under the current interface, it looks like one has to convert the string to a textConnection and then parse it with jsonlite::stream_in:


str <- '[{
    "foo": 1,
    "bar": 2
  },
  {
    "foo": 3,
    "bar": 4
  },
  {
    "foo": 5,
    "bar": 6
}]'

out <- jq(str, ".[]") %>%textConnection() %>% stream_in(simplifyDataFrame=FALSE) 

str(out)

Which I think is unnecessarily cumbersome. Would be better if it was just a valid json string.

cboettig avatar Aug 10 '17 22:08 cboettig

Thanks @cboettig

This is a tough one. There's many times the end of a jq query is not proper JSON - and it would take some work to make the output back into proper JSON.

Is there some confusion with the text connection, since jsonlite::stream_in requires a connection object? that is, you can't pass in a string to it.


Not sure this really helps a ton, but we do have a fxn in jqr called combine:

jq(str, ".[]") %>% unclass
#> [1] "{\"foo\":1,\"bar\":2}" "{\"foo\":3,\"bar\":4}" "{\"foo\":5,\"bar\":6}"
jq(str, ".[]") %>% combine() %>% unclass
#> [1] "[{\"foo\":1,\"bar\":2}, {\"foo\":3,\"bar\":4}, {\"foo\":5,\"bar\":6}]"

that attempts to combine separate valid JSON objects into a single valid JSON objects, but may not work in all cases


any thoughts @jeroen ?

sckott avatar Aug 11 '17 23:08 sckott

Thanks @sckott, can you give some examples where it's not proper json (i.e. not proper json fragments that we can handle with stream_in? Is the return type of a jq query documented anywhere?

I'd be happy with generic named list format (e.g. what you get from parsing json anyway), it just makes sense to have a structured, sub-settable object instead of a raw string.

cboettig avatar Aug 14 '17 19:08 cboettig

I don't think I want to convert to a list. Rather leave that to the user here.

I guess you're right that seems like most if not all outputs can be handled by jsonlite::stream_in.

So perhaps there's two approaches:

  • if a single JSON object comes out, can use jsonlite::fromJSON to convert to list/data.frame
  • if > 1 JSON object, can:
    • use jqr::combine to try to combine them, then use jsonlite::fromJSON
    • use jsonlite::stream_in
    • use apply family fxn on each element output, applying fxn jsonlite::fromJSON

sckott avatar Aug 14 '17 23:08 sckott

So it sounds like the result is always 1 or more JSON objects. I do think it is a bit confusing to call that a jqson object class, with a print method that makes it look like a single json object, e.g. it adds the [] bounding brackets and the commas in the print method that aren't actually there in the return object:

jq(str, ".[]")

in the above example prints:

#> [
#>     {
#>         "foo": 1,
#>         "bar": 2
#>     },
#>     {
#>         "foo": 3,
#>         "bar": 4
#>     },
#>     {
#>         "foo": 5,
#>         "bar": 6
#>     }
#> ]

which is valid JSON, and tricks the user into thinking that readJSON can work on it.

I think it would much cleaner if the internal object string was identical to that printed string (e.g. a valid JSON string that was a list of JSON strings). Then the reader would just have a string object they could work with nicely with fromJSON or however they wanted to work with it. Does that make sense?

cboettig avatar Aug 14 '17 23:08 cboettig

@cboettig can you install devtools::install_github("ropensci/jqr@return-type")

and try it again - see readme https://github.com/ropensci/jqr/tree/return-type#return-valid-json-or-not

thoughts @jeroen @richfitz on changing the format of output?

note:

  • [ ] if we go this route need to fix: jsonify param for some reason not accessible when using high level interface, whether piping or not

sckott avatar Aug 15 '17 17:08 sckott

This looks great! Having a valid json object returned makes the interface really nice.

What's the use case for jsonify=FALSE? It seems like in that case it would make sense to return an array of strings, (character vector) so that one can iterate on it with typical R functions.

I'm confused why one needs variable assignment first in the pipes examples, I guess that forces jq to evaluate? Could we add another function call in the pipeline instead which would do this?

cboettig avatar Aug 15 '17 19:08 cboettig

What's the use case for jsonify=FALSE? It seems like in that case it would make sense to return an array of strings, (character vector) so that one can iterate on it with typical R functions.

That's what jq actually returns, and undoubtedly i'm sure some users will want want jq actually returns. in addition, some users may want separate chunks

I'm confused why one needs variable assignment first in the pipes examples

because we have a nice hack that Stefan made for us via magrittr that allows us to pipe together the high level commands and then it by itself automatigically executes the actual jq command when no more pipe operations are detected. we can have custom methods/fxns that can toggle that off at the end, e.g., peek(), but when going to a fxn outside the pkg, like jsonlite::fromJSON we'd need to change that fxn to toggle off the pipe magic we have, so not sure we can really do that

sckott avatar Aug 15 '17 19:08 sckott

right, good point that we should be consistent about the original jq behavior. I finally read the docs:

Data in jq is represented as streams of JSON values - every jq expression runs for each value in its input stream, and can produce any number of values to its output stream.

Streams are serialised by just separating JSON values with whitespace. This is a cat-friendly format - you can just join two JSON streams together and get a valid JSON stream.

If you want to get the output as a single array, you can tell jq to “collect” all of the answers by wrapping the filter in square brackets:

jq '[.[] | {message: .commit.message, name: .commit.committer.name}]'

So I should probably have just been constructing my queries that way to begin with and I would have gotten a single json argument. Maybe it is better to document this (e.g. using [.[]] instead of .[] etc) rather than introduce the toggle behavior?

I see that as a command line tool, streams of json objects make the ideal output format since this is (bash-shell) pipe friendly and cat friendly. I do think it is a weird default for an R tool though, since that's not how pipes or iterators work in R.

Nitpick but I think the terms jqson and jsonify add to the confusion, the output is "streams of JSON values", which is more precise.

Re magrittr stuff, right, I was wondering if one might wrap fromJSON slightly in the jq package into something that was pipe friendly?

cboettig avatar Aug 15 '17 20:08 cboettig

thanks for feedback @cboettig - we're pushing a major version to cran now with jq unbundled (links to version on users machine) - so will come back to this for v1.1 or so

sckott avatar Sep 19 '17 17:09 sckott

work still on branch https://github.com/ropensci/jqr/compare/return-type - just merged in from master - need to clean up and see how previous approach will work with new jq/jqr setup, see R/jq.R

sckott avatar Oct 19 '18 23:10 sckott