gron
gron copied to clipboard
multiple json objects in input?
Hi, thanks for gron, looks really useful! I tried feeding in multiple json objects but noticed only the first one is reported:
$ echo -e '{"foo": "bar"}\n{"baz": "meh"}' | ./gron
json = {};
json.foo = "bar";
Is this something that can or will be supported?
thanks!
Hi @filippog! Thank you! :)
It certainly can be supported, and I think it would be a good idea.
Do you think it would be OK to enable the feature with a command line option rather than trying to auto-detect multiple objects in the input?
If autodetection is expensive and/or unreliable to convert to/from gron then yeah a command line option would do. Otherwise I was expecting gron to just work when fed multiple objects, case in point for me is reading from an access log where every entry is a json object, separated by \n
I've currently the same problem, in this case using it with jq:
Minimal example:
λ cat ~/gron_tmp
{
"data" : [
{"a": "1"},
{"a": "2"}
]
}
λ cat ~/gron_tmp |jq '.data[]'
{
"a": "1"
}
{
"a": "2"
}
λ cat ~/gron_tmp |jq '.data[]' |gron
json = {};
json.a = "1";
I've also seen logfiles consisting of one json object per line (ndjson), but that expects minified json.
-> my preference would be if an object ends and a new one starts starts straight after (only whitespace between }...{
), just take that object as well. I wouldn't even mind if there wouldn't be any difference between the lines:
λ cat ~/gron_tmp |jq '.data[]' |gron
json = {};
json.a = "1";
json = {};
json.a = "2";
Although a commandline switch to implicitly treat the input as a list would be nice:
λ cat ~/gron_tmp |jq '.data[]' |gron --assume-list
json[0] = {};
json[0].a = "1";
json[1] = {};
json[1].a = "2";
Hey, sorry this hasn't had the attention it needs... Kids keep you pretty busy!
I've been giving some thought about the approach needed for this, and there's probably only two sane options:
- Require that the input be one JSON blob per line so it's easy to split on
\n
- Do a pre-parse step to detect multiple JSON objects in the input
Option 1 is by far the easiest to implement, but it doesn't work for @jan-schulz-k24's example where each JSON blob spans multiple lines.
Option 2 is far more permissive, but it requires a rune-by-rune inspection of the input text (you can't just, say, regex for }[^,]*{
because that sequence could appear in a string value)
The problem with option 2 is that it's pretty expensive to do, especially when the input is very large. This is made slightly better by only enabling multi-object input when a command line flag is specified.
On balance I think that gron working in more situations is more important than performance, so option 2 is probably best.
jq -c
leaves one object per line. So IMO
- it would be ok if gron could work with json-line/ndjson out of the box and without sacrisfying speed, and
- only switches to the expensive "multiple lines per json object and multiple objects" parsing if a command line is given
@jan-schulz-k24 @filippog firstly: thank you for your patience!
I've added basic support for multi-object input in b9faf397
At the moment it only supports one object per line; I'm not 100% convinced this is the best solution but it's certainly the easiest to implement.
You can use the feature with the -s
/--stream
flag:
tom@work:~▶ cat stream.json
{"one": 1, "two": 2, "three": [1, 2, 3]}
{"one": 1, "two": 2, "three": [1, 2, 3]}
tom@work:~▶ gron --stream stream.json
json = [];
json[0] = {};
json[0].one = 1;
json[0].three = [];
json[0].three[0] = 1;
json[0].three[1] = 2;
json[0].three[2] = 3;
json[0].two = 2;
json[1] = {};
json[1].one = 1;
json[1].three = [];
json[1].three[0] = 1;
json[1].three[1] = 2;
json[1].three[2] = 3;
json[1].two = 2;
Internally it reads the input line by line, so it will start to provide output as soon as a line is available to read. So in the below example the output appears in three chunks with two second intervals between them.
tom@work:~▶ cat delay.sh
#!/bin/bash
echo '{"one": 1, "two": 2}'
sleep 2
echo '{"three": 3, "four": 4}'
sleep 2
echo '{"five": 5, "six": 6}'
tom@work:~▶ ./delay.sh | gron -s
json = [];
json[0] = {};
json[0].one = 1;
json[0].two = 2;
json[1] = {};
json[1].four = 4;
json[1].three = 3;
json[2] = {};
json[2].five = 5;
json[2].six = 6;
This should make it possible to work with steaming HTTP APIs - most of which seem to provide one object per line.
I haven't tagged a release yet, and I'm going to leave this issue open for a while longer because I'd like to think more about supporting objects that span many lines.
Let me know your thoughts / if you have any problems.
Thanks again for your patience at this particularly busy time in my life! :laughing:
Thanks @tomnomnom for working on this! I did a quick test with the dataset I have and works great with --stream
!
I've got JSON output with multiple objects but they're not one-per-line - gron currently only handles the first of these objects. I do have a hacky/sketchy patch for stream mode which handles this case but obviously don't want to step on any toes if there's another solution in the works?
@rjp I also ran into this issue (specifically with the GitHub cli tool's --paginate
option), and I ended up with this hacky sed
one-liner to work around it:
gh api --paginate /repos/{owner}/{repo}/environments | sed -E 's|\}\{|\}\n\{|g' | gron --stream