jqr icon indicating copy to clipboard operation
jqr copied to clipboard

Passing --raw-output jq flag to enable json -> csv conversion to be "pushed down" to jq and enabling jqr to picking up nonjson results

Open mskyttner opened this issue 3 years ago • 4 comments

This is an issue or perhaps feature request related to having jqr support raw output from jq, with non-json return type. The use case is to use jqr to "push down" some queries/work to jq which would benefit from jqr supporting the "--raw-output" option when returning for example csv data.

An example of such a use case is working with a large json(ish) file, getting only some elements, converting those to csv (with all these steps pushed down to jq) and then from jqr picking up these non-json raw results.

At bash it would be a command similar to this one: cat jsons | jq -r '[.id, .orcid] | @csv', and here is an illustration from R with some example data:

library(jqr)
library(readr)

# data is not valid (nd)json, it is what it is..., and looks like this:
jsons <- 
  '{
    "id": "u1003nxf",
    "orcid": "",
    "profile": {
        "firstName": "John",
        "lastName": "Doe"
    }
}
{
    "id": "u1002cfh",
    "orcid": "",
    "profile": {
        "firstName": "Jo",
        "lastName": "Doe"
    }
}
'

# attempting to use jqr, the "--raw-output" is not amongst the jq_flags()
# so the output always gets returned as if it was json, with quotes, ...
# which is a bit cumbersome to dejsonify after this has happened...

readr::read_lines(jsons) %>% 
  jqr::jq("[.id, .orcid] | @csv") %>%
  as.character()
# [1] "\"\\\"u1003nxf\\\",\\\"\\\"\"" "\"\\\"u1002cfh\\\",\\\"\\\"\""

# this CLI invocation of jq works for converting jsons to csv
# but it bypasses jqr completely
writeLines(jsons, "jsons")
csv <- system("cat jsons | jq -r '[.id, .orcid] | @csv'", intern = TRUE)
readr::read_csv(csv, col_names = c("id", ".orcid"))

# A tibble: 2 x 2
#id       .orcid
#<chr>    <lgl> 
# 1 u1003nxf NA    
# 2 u1002cfh NA

# there doesn't seem to be a straight forward way to do this if using `jqr` currently?

Workarounds or fixes

There seem to be these options supported by jq as noted in a related issue. So currently in jqr, the jq_flags there are more like jv_dump_string_flags() as described here.

So "expected" jq command line options are not currently passed in jqr, since they're hardcoded and set to 0 here unlike when jq runs at the command line.

It looks like there doesn't seem to be a way to pass a "--raw-output" option through jqr right now because of this. Everything gets converted to json first.

If the jq query/program/DSL when processed could get a parameter for the jq options/flags passed in it it could branch out and use jv_string_value for raw outputs instead of passing everything to jv_dump_string. Now everything goes to jv_dump_string which is what I think causes the "double quoting" of quote characters before these non-json results are returned to jqr?

Related issues

I think this issue relates to these other issues (with the variation that it would like to be able to use jqr to pass the --raw-output flag/option and then expect the return type to not be json in order to support a jq query push down which uses "|@csv"):

  • https://github.com/ropensci/jqr/issues/30
  • https://github.com/ropensci/jqr/issues/56
  • https://github.com/ropensci/jqr/issues/70

mskyttner avatar Dec 07 '20 11:12 mskyttner

Thanks for the issue!

It's not clear if we can support raw output or not. The flags stuff is a bit beyond my comprehension of jq and our interface with it. I'd like to support this, but probably can't sort this out myself.

Do you know of a proposed fix?

sckott avatar Dec 07 '20 19:12 sckott

I tried to untangle it a little but I'm not so sure about the lowlevel stuff. I think a proposed fix perhaps would involve steps like:

  • Add a parameter (for passing the "raw-output" and other such jq options/flags) here, basically adding code to do more of what is done here in main.c in a similar way, I guess, including dealing with some of those errors. Right now it looks like it is a cleverly simplified version which avoids dealing with the errors and skips the raw output steps/branching.
  • Change the flag passing which bakes some things together right now, and appear to mix together jq command line options with "jv_dump_string flags" at a higher level, and then splitting them out a bit further in, here
  • Somehow make sure these flags for example for "--raw-output" gets passed all way through from R especially in the process step when a program is being run... where it currently gets set to 0.

Not brave enough to do a PR on it though, I'm afraid.

mskyttner avatar Dec 07 '20 21:12 mskyttner

My workaround for now, when using jq with "--raw-output" while converting to CSV is to "shell out" to a 4MB docker image and passing results back to R:

# workaround but depends on docker and a 4 MB docker image with jq
# attempted to use stevedore first, but ran into issues with the command splitter
# and with capturing output from the command

# function to enable running "jq" with --raw-output through docker
docker_cli_runc <- function(slug, command, v_host, v_container) {
  
  stopifnot(file.exists(v_host))
  
  cli_runc <- function(slug, command, v_host, v_container)
    sprintf("docker run --rm -v %s:%s %s %s", 
      v_host, v_container, slug, command)
  
  cmd <- cli_runc(slug, command, v_host, v_container)
  
  system(cmd, intern = TRUE)
  
}

# small example data
jsons <- 
  '{
    "id": "u1003nxf",
    "orcid": "",
    "profile": {
        "firstName": "John",
        "lastName": "Doe"
    }
}
{
    "id": "u1002cfh",
    "orcid": "",
    "profile": {
        "firstName": "Jo",
        "lastName": "Doe"
    }
}
'
# available on hosts disk
readr::write_lines(jsons, "~/temp/jsons")

# test running jq with --raw-output
docker_cli_runc(
  slug = "docker.io/endeveit/docker-jq",
  command = "cat /tmp/jsons | jq -r '[.id, .orcid] | @csv'",
  v_host = "~/temp/jsons",
  v_container = "/tmp/jsons"
)

# poor man's wrapper
jq <- function(file, query) {
  command <- sprintf("cat /tmp/jsons | %s", query)
  docker_cli_runc(
    slug = "docker.io/endeveit/docker-jq",
    command = command,
    v_host = file, v_container = "/tmp/jsons")  
}

# using it on arbitrary json files
library(magrittr)

"~/temp/jsons" %>% 
  jq("jq -r '[.id, .orcid] | @csv'") %>%
  readr::read_csv(col_names = c("id", "orcid"))

mskyttner avatar Dec 09 '20 12:12 mskyttner

Thanks - i will try to have a look soon, no promises

sckott avatar Dec 16 '20 16:12 sckott