SUP-formatted values and SuperSQL literals often look the same, but are not always interchangeable
❯ super --version
Version: v1.18.0-350-g733dc02a1
❯ super -z -c "{foo:[]}"
{foo:[]([null])}
❯ super -z -c "{foo:[]([null])}"
parse error at line 1, column 8:
{foo:[]([null])}
=== ^ ===
❯ echo "{foo:[]([null])}" | super -z -c 'yield this' -
{foo:[]([null])}
I'm curious about this. (And I know there's a related issue somewhere about outputting types, but I couldn't find it, so maybe this is just a side-effect of that issue). Why does passing in the record via stdin work, but inlining it into a command causes a parsing problem? And ...should it?
I presume the stdin case works because something like this is happening under the covers:
❯ super -z -c "parse_zson('{foo:[]([null])}')"
{foo:[]([null])}
It's confusing to me I think, because I can inline other structures without having to do this, so as I get working in larger scripts, I'm not always keeping straight when I do something like:
function foo() {
super -z -c "$my_data | ..."
}
vs.
function foo() {
echo "$my_data" | super -z -c "..." -
}
Both work a lot of the time, but then when a type like ([null]) creep in, I can have funky parsing errors happen that catch me off guard when I'm not disciplined about also passing data in via stdin, vs. inlined.
@chrismo: Yes, I understand this can be confusing. I've bumped up against this as a user, myself. You were on the right track with your hunch about parse_zson(), but below is how it was summarized to me when I discussed with the Dev team.
Using something like records as an example, the bottom line is that they often happen to look exactly the same in Super JSON (SUP) format and as literals in the SuperSQL language, but they're not always the same and aren't intended to be the same. Using an example similar to your own (I'm avoiding null because it makes everything more confusing 😛), I can turn some JSON into SUP:
$ super -version
Version: v1.18.0-366-g7f5ca96c0
$ echo '{"hello": "world", "num": 1}' | super -z -
{hello:"world",num:1}
Then turn right around and use that same SUP text as a record literal in my SuperSQL query.
$ super -c '{hello:"world",num:1} | yield this'
{hello:"world",num:1}
But a type decorator happens to be part of the SUP format but not the language (per the opening text here), so I can't do the same cut & paste here:
$ echo '{"hello": "world", "num": 1}' | super -z -c 'num:=int8(num)' -
{hello:"world",num:1(int8)}
$ super -c '{hello:"world",num:1(int8)} | yield this'
parse error at line 1, column 21:
{hello:"world",num:1(int8)} | yield this
=== ^ ===
To affect the data type as I'm writing out the record literal, I could cast like this:
$ super -c '{hello:"world",num:int8(1)} | yield this'
{hello:"world",num:1(int8)}
Specifically, @mattnibs pointed out to me that this particular clash occurs because the language parser sees the parentheses as related to function calls, like in this contrived example:
$ echo '{int8:"hi"}' | super -c '{hello:"world",num:upper(int8)} | yield this' -
{hello:"world",num:"HI"}
So the reaction from the Dev team is that if we wanted to address this we'd probably need to change how type decorators look in the SUP format, which would be a pretty significant change and hence is not something we're likely to take up right away, e.g., maybe some time after a first GA release of SuperDB, if we're contemplating a "v2" of the SUP format that might include other changes, this could perhaps be addressed as part of that. Perhaps before then we might think of something else creative we could do in terms of UX to make this less confusing, but there's not yet specific ideas on the table.
What I think I'll do for now is change the issue description and hold it open in the event other users have similar confusion. It can also serve as a reminder that we take up the topic again down the road. Thanks for flagging!
is this the same issue?
single-quoted string in echo, doesn't parse:
❯ echo "{id:1,task:'foo'}" | zq -z -
... zjson: line 1: malformed ZJSON: bad type object: "{id:1,task:'foo'}": unpacker error parsing JSON: invalid character 'i' looking for beginning of object key string ...
❯ echo "{id:1,task:'foo'}" | super -z -
... zjson: line 1: malformed ZJSON: bad type object: "{id:1,task:'foo'}": unpacker error parsing JSON: invalid character 'i' looking for beginning of object key string ...
but with double-quotes on the foo string, it works, and inlined into the zq or super command
single-quoted string inlined into the command:
❯ zq -z "{id:1,task:'foo'}"
{id:1,task:"foo"}
❯ super -z -c "{id:1,task:'foo'}"
{id:1,task:"foo"}
double-quoted string in echo, parses fine:
❯ echo '{id:1,task:"foo"}' | zq -z -
{id:1,task:"foo"}
❯ echo '{id:1,task:"foo"}' | super -z -
{id:1,task:"foo"}
@chrismo: Correct. When they're coming in via echo those values you're showing are treated as human-readable Super (SUP) (which has been called "Super JSON (JSUP)" recently, but is in the process of being renamed), and the SUP format doc explains that these specifically need to be double-quoted. This is similar to how JSON uses double quotes around strings:
$ echo '{"id":1,"task":"foo"}' | jq .
{
"id": 1,
"task": "foo"
}
$ echo "{'id':1,'task':'foo'}" | jq .
jq: parse error: Invalid numeric literal at line 1, column 6
Meanwhile for the string literals created within the SuperSQL query itself, the Expressions doc explains that single or double quotes are fair game there.
Random thought: Despite the text that says both are allowed, maybe the subtlety gets missed because the SuperSQL docs almost always show the string literals in double quotes. I feel like this is an artifact of so many of the examples being run on the command line where wrapping the whole of the query in single quotes was often a smart decision because it avoids hazards like the shell interpreting use of $ as references to environment variables, for instance, which leads to the habit of using the double quotes within the query. What's funny is that now we're thinking in more SQL-centric ways this all gets flipped on its head because most SQL queries in RDBMS materials are wrapped in double quotes and then single quotes are used for strings within the query. So I'm kinda going through it a bit myself at the moment. 😅
Thx for the confirmation.
I'll confirm I love that both single-quoted and double-quoted work - since I'm usually working in shell scripts, I usually default to double quotes around the command text so I can interpolate in shell variables, with single quoted strings inlined.
@chrismo: I'm pleased to report this one has been addressed. To summarize, the comment above mentioned that to fix this would likely require "change to how type decorators look in the SUP format", and indeed that's what was done in linked PR #6009.
Verifying with your specific example, in super commit 7f23f65, now you can take the SUP output of the command:
$ super -version
Version: 7f23f6560
$ super -s -c "{foo:[]}"
{foo:[]::[null]}
And paste it right back in as a literal in query:
$ super -s -c "{foo:[]::[null]}"
{foo:[]::[null]}
...and it's accepted rather than causing a parse error as it did in the past.
At the moment this is one of multiple recent breaking changes to SUP, so if someone has a lot of older ZSON data or earlier SUP format files and tries to read them with the latest super, it can cause problems (e.g., see this comment). If this is a concern the quickest workaround is probably to use an older super binary to read the older ZSON/SUP and output to BSUP, then read the BSUP with a newer super binary. It's as yet undecided the degree to which we may invest Dev cycles toward backward compatibility, e.g., maybe just maintain a flag in the newer super that explicitly asks to read in the old format SUP, or try to auto-detect the older format and read that seamlessly with the older reader. Since SUP is technically a "new format" and it has not yet been revealed in any GA release there's the possibility we may just break with the past and save the Dev cycles for other priorities, though I know this may create hassles for folks like yourself that have been along for the ride while things are under development. Feedback here in issues or on Slack is always welcomed.
For now, closing this one. Thanks again for flagging!
That's great! Also, thx for the BSUP upgrade suggestion - I may use that very soon for a project.