gron icon indicating copy to clipboard operation
gron copied to clipboard

multiple json objects in input?

Open filippog opened this issue 8 years ago • 9 comments

Hi, thanks for gron, looks really useful! I tried feeding in multiple json objects but noticed only the first one is reported:

$ echo -e '{"foo": "bar"}\n{"baz": "meh"}' | ./gron 
json = {};
json.foo = "bar";

Is this something that can or will be supported?

thanks!

filippog avatar Oct 10 '16 09:10 filippog

Hi @filippog! Thank you! :)

It certainly can be supported, and I think it would be a good idea.

Do you think it would be OK to enable the feature with a command line option rather than trying to auto-detect multiple objects in the input?

tomnomnom avatar Oct 10 '16 19:10 tomnomnom

If autodetection is expensive and/or unreliable to convert to/from gron then yeah a command line option would do. Otherwise I was expecting gron to just work when fed multiple objects, case in point for me is reading from an access log where every entry is a json object, separated by \n

filippog avatar Oct 11 '16 09:10 filippog

I've currently the same problem, in this case using it with jq:

Minimal example:

λ cat ~/gron_tmp                              
{ 
"data" : [
  {"a": "1"}, 
  {"a": "2"}
  ]
}

λ cat ~/gron_tmp |jq '.data[]'      
{
  "a": "1"
}
{
  "a": "2"
}

λ cat ~/gron_tmp |jq '.data[]' |gron
json = {};
json.a = "1";

I've also seen logfiles consisting of one json object per line (ndjson), but that expects minified json.

-> my preference would be if an object ends and a new one starts starts straight after (only whitespace between }...{), just take that object as well. I wouldn't even mind if there wouldn't be any difference between the lines:

λ cat ~/gron_tmp |jq '.data[]' |gron
json = {};
json.a = "1";
json = {};
json.a = "2";

Although a commandline switch to implicitly treat the input as a list would be nice:

λ cat ~/gron_tmp |jq '.data[]' |gron --assume-list
json[0] = {};
json[0].a = "1";
json[1] = {};
json[1].a = "2";

ghost avatar Nov 08 '16 11:11 ghost

Hey, sorry this hasn't had the attention it needs... Kids keep you pretty busy!

I've been giving some thought about the approach needed for this, and there's probably only two sane options:

  1. Require that the input be one JSON blob per line so it's easy to split on \n
  2. Do a pre-parse step to detect multiple JSON objects in the input

Option 1 is by far the easiest to implement, but it doesn't work for @jan-schulz-k24's example where each JSON blob spans multiple lines.

Option 2 is far more permissive, but it requires a rune-by-rune inspection of the input text (you can't just, say, regex for }[^,]*{ because that sequence could appear in a string value)

The problem with option 2 is that it's pretty expensive to do, especially when the input is very large. This is made slightly better by only enabling multi-object input when a command line flag is specified.

On balance I think that gron working in more situations is more important than performance, so option 2 is probably best.

tomnomnom avatar Nov 08 '16 21:11 tomnomnom

jq -c leaves one object per line. So IMO

  • it would be ok if gron could work with json-line/ndjson out of the box and without sacrisfying speed, and
  • only switches to the expensive "multiple lines per json object and multiple objects" parsing if a command line is given

ghost avatar Nov 09 '16 09:11 ghost

@jan-schulz-k24 @filippog firstly: thank you for your patience!

I've added basic support for multi-object input in b9faf397

At the moment it only supports one object per line; I'm not 100% convinced this is the best solution but it's certainly the easiest to implement.

You can use the feature with the -s/--stream flag:

tom@work:~▶ cat stream.json 
{"one": 1, "two": 2, "three": [1, 2, 3]}
{"one": 1, "two": 2, "three": [1, 2, 3]}
tom@work:~▶ gron --stream stream.json 
json = [];
json[0] = {};
json[0].one = 1;
json[0].three = [];
json[0].three[0] = 1;
json[0].three[1] = 2;
json[0].three[2] = 3;
json[0].two = 2;
json[1] = {};
json[1].one = 1;
json[1].three = [];
json[1].three[0] = 1;
json[1].three[1] = 2;
json[1].three[2] = 3;
json[1].two = 2;

Internally it reads the input line by line, so it will start to provide output as soon as a line is available to read. So in the below example the output appears in three chunks with two second intervals between them.

tom@work:~▶ cat delay.sh 
#!/bin/bash
echo '{"one": 1, "two": 2}'
sleep 2
echo '{"three": 3, "four": 4}'
sleep 2
echo '{"five": 5, "six": 6}'
tom@work:~▶ ./delay.sh | gron -s
json = [];
json[0] = {};
json[0].one = 1;
json[0].two = 2;
json[1] = {};
json[1].four = 4;
json[1].three = 3;
json[2] = {};
json[2].five = 5;
json[2].six = 6;

This should make it possible to work with steaming HTTP APIs - most of which seem to provide one object per line.

I haven't tagged a release yet, and I'm going to leave this issue open for a while longer because I'd like to think more about supporting objects that span many lines.

Let me know your thoughts / if you have any problems.

Thanks again for your patience at this particularly busy time in my life! :laughing:

tomnomnom avatar Nov 24 '16 13:11 tomnomnom

Thanks @tomnomnom for working on this! I did a quick test with the dataset I have and works great with --stream !

filippog avatar Nov 24 '16 19:11 filippog

I've got JSON output with multiple objects but they're not one-per-line - gron currently only handles the first of these objects. I do have a hacky/sketchy patch for stream mode which handles this case but obviously don't want to step on any toes if there's another solution in the works?

rjp avatar Jan 10 '22 14:01 rjp

@rjp I also ran into this issue (specifically with the GitHub cli tool's --paginate option), and I ended up with this hacky sed one-liner to work around it:

gh api --paginate /repos/{owner}/{repo}/environments | sed -E 's|\}\{|\}\n\{|g' | gron --stream 

noahp avatar Nov 03 '22 15:11 noahp