gron shell-friendly output formatting

I'm super excited by this project. I've been dreaming of something like this for a while. jq is powerful but it won't do for certain use cases.

The issue: when working with classical shell tools such as e.g. cut, the current output format is a bit obnoxious.

▶ gron "https://api.github.com/repos/tomnomnom/gron/commits?per_page=1" | fgrep "commit.author"
json[0].commit.author = {};
json[0].commit.author.date = "2016-07-02T10:51:21Z";
json[0].commit.author.email = "[email protected]";
json[0].commit.author.name = "Tom Hudson";

The ideal would be the following:

▶ gron "https://api.github.com/repos/tomnomnom/gron/commits?per_page=1" --output-shell | fgrep "commit.author"
0.commit.author.date=2016-07-02T10:51:21Z
[email protected]
0.commit.author.name=Tom Hudson

Essentially:

Skip print inner nodes (such as json[0].commit.author = {};), only print leaves;
Don't end the line with a semicolon;
Don't quote strings;
Don't put spaces around the =;
Leave out the starting json;
Array indices as foo.0.bar instead of foo[0].bar;
BONUS: make the separator (=) configurable.

If you agree that this is the correct thing to do, then I would be happy to write a PR.

Apr 03 '18 13:04 carlpaten

Hi @lilred! Thanks for raising an issue :)

It's certainly an interesting idea, but I have a few reservations about such a format. The main problem I foresee is I don't think it would be possible to reliably parse the format, which somewhat defeats the purpose.

Consider this pretty horrible - but perfectly valid - snippet of JSON:

{"1.key.one=foo":"null"}

The output in the proposed format would be something like:

1.key.one=foo=null

This isn't parseable (by machines or humans) because there's no way to know if the dots were part of a single key or separate keys, no way to know which = character should be used (it could have existed in the key or in the value), but also type information is lost (i.e. the value was actually the string 'null' and not an actual null value, but there is no way to differentiate the two). Strings containing only numbers (e.g. "123") are valid object keys, so information about the type of container object could be lost too.

It would be nice to make the output format a bit more friendly, but I think it's very difficult to do so without introducing these kinds of ambiguity issues. All of the solutions I can think of to fix this problem pretty much just lead back to using the existing format.

Perhaps take a look at the input grammar and see if you can come up with an equivalent that removes enough ambiguity to be parseable whilst remaining 'cleaner'?

Apr 03 '18 13:04 tomnomnom

These are some very good objections. I'll chime back in later with a proper response. It will probably rely on the following three assumptions, which I'd like to see you validate:

In the context of shell-friendly outputting,

Losing type information is sometimes OK. Shell doesn't distinguish between "123" and 123, so any output format meant for consumption by shell tools should also erase that distinction.
More generally, lossy round-tripping is sometimes OK. Shells also don't distinguish between null, undefined, and "".
For "pathological" (though likely common) records like the one you describe, it should be possible to pick a different separator for field nesting (.) and assignment (=).

Re-parsing this kind of format is not a typical use case, and could even be left out altogether; but a best-effort approach would be an acceptable alternative IMHO.

I'm eager to hear your thoughts.

Apr 03 '18 13:04 carlpaten

@lilred I think you raise good points :)

I think what this really needs is some experimentation to get a feel for how useful various output formats may be.

In light of that, I've hijacked the the parser from gron and shoe-horned it into a hacky little tool called gron2shell.

It should install fine with go get -u github.com/tomnomnom/hacks/gron2shell.

The tool accepts gron's output format on stdin, tokenises it, and then re-outputs on stdout via a big ol' switch/case statement so you can get fine-grained control over what happens with each token.

I've implemented your suggested format as a starting point (although I have not yet implemented configurable separators).

Here's a demo of it in use:

▶ gron "https://api.github.com/repos/tomnomnom/gron/commits?per_page=1" | fgrep "commit.author" | gron2shell 
0.commit.author.date=2018-04-03T13:55:59Z
[email protected]
0.commit.author.name=Tom Hudson

I encourage you to grab the source and have a play around with it; check if it works well for your use-cases etc.

If we can come up with a good format with clear benefits I can merge it into gron properly; if we can't: at least you've got a relatively easy way to get the output format you want.

Apr 04 '18 10:04 tomnomnom

This is absolutely fantastic, thank you!!

Apr 08 '18 19:04 carlpaten

@lilred no problemo :) Let me know if you come up with any insights from playing with it!

Apr 09 '18 09:04 tomnomnom

gron gron copied to clipboard

shell-friendly output formatting

gron
gron copied to clipboard