Implicit stdin support / Unintuitive behaviour when omitting '-'
When attempting to onboard from jq, I've had somewhat unintuitive results exploring basic commands:
echo '{a:1,b:2,c:3}' | zq 'yield this' - => {a:1,b:2,c:3}
So far so good, though the docs list the default query when omitting one as being *, so let's try that:
echo '{a:1,b:2,c:3}' | zq '*' - => {a:1,b:2,c:3}
Cool, equivalent behaviour to what I get from having no query:
echo '{a:1,b:2,c:3}' | zq - => {a:1,b:2,c:3}
Except... This option of omitting the query entirely leads into some interesting territory. According to the --help docs:
If the first argument is a path to a valid file rather than a Zed query, then the Zed query is assumed to be "*", i.e., match and output all of the input. If the first argument is both a valid Zed query and an existing file, then the file overrides.
So what happens if we make a request with a valid query and allow standard input to be assumed as the input?
echo '{a:1,b:2,c:3}' | zq 'yield this' => null
Here's our first unexpected behaviour: not passing a query seems to ignore stdin. I thought maybe this was due to - counting as a valid zed query, but the following experiment shows otherwise:
echo '{a:1,b:2,c:3}' | zq '-' - => {a:1,b:2,c:3}
So let's try passing the real "default" query given by the CLI docs:
> echo '{a:1,b:2,c:3}' | zq '*'
zq: could not invoke zq with a single argument because:
- a file could not be found with the name "*"
- the argument did not parse as a valid Zed query
So the default zed query of '*' is not a valid zed query?
... Except when you pass stdin explicitly:
echo '{a:1,b:2,c:3}' | zq '*' - => {a:1,b:2,c:3}
As a new user, the issue seems to be that allowing implicit queries is a significant footgun and should be replaced with the more POSIX-friendly implicit stdin input convention. There may be some nuance here that makes shell scripting a lower-priority target than some other workflow that the zq tutorial hasn't introduced, but this is not apparent from day one, and the surprising nature of the CLI behaviour makes me hesitant to adopt this in lieu of JQ.
@Dessix: Thanks for this feedback. Indeed, the core Zed development team has also recognized there's room for improvement here. It helps to have the input of a user that's coming at this with fresh eyes. We'll have another round of discussion and see what we can improve here. I'll circle back with findings.
Wait, doesn’t that just announce the unintuitive behavior instead of correcting it?
@Dessix: Indeed, a PR merge triggered the early closure of this issue, so I'm reopening. As you noted, #5119 just rationalizes the tool's current behavior so it's no longer contradicting the docs. There's more material improvements we still plan to make that should more directly address the unintuitive behavior. We'll circle back with more detail soon.
Thank you @philrz ; I'll keep following this to make sure it doesn't get lost beneath GitHub noise then ^^
I believe the Closes 5110 note is why the PR closed it upon merge, incidentally.
@Dessix: Yes, it certainly was the Closes #5110. It would be nice if they had an easier way to link up PRs and Issues in advance without the auto-close trigger upon merge. Instead I've found I have to manually do the linkage after the fact. Anyway, separate topic. 😛
Here's the more detailed response to your concerns. The main improvement we still intend to make is what's described in #5116. That should help with the unexpected behavior you cited in this query:
echo '{a:1,b:2,c:3}' | zq 'yield this' => null
The reason that was happening was due to the "calculator mode" described in the closing of the Usage section of the zq command docs. That's not covered in the zq tutorial which is focused on the more common use cases of streaming/file/URL inputs. But it's helpful to know how you managed to bump into it while following just the tutorial. Looking at it through the eyes of a new user, indeed, we can see how zq's "implicit null input" is not intuitive there and it's unreasonable for us to expect users to read through enough docs to understand that behavior, let alone all like it.
When we add what's described in #5116, then null input behavior will go away and your example above will now return {a:1,b:2,c:3} as you originally expected. Users that still want the "calculator mode" will then have to add the -n flag, but since this is a less common use case it seems like a good trade-off to avoid confusing future users.
Regarding your other finding:
echo '{a:1,b:2,c:3}' | zq '-' - => {a:1,b:2,c:3}
That's expected. The single quotes in '-' get stripped by the shell, then as a developer colleague summarized for me:
zq - -says “read from standard input twice”. Standard input is at EOF afterzqhas read it once so reading it a second time produces no values.
This is not unique to zq, as you can see the same effect with echo 'hello' | cat '-' - for instance (run it under set -xv to see the effect of the shell stripping the quotes).
Finally, regarding a possible shift to an overall:
friendly implicit
stdininput convention
Your point is understood, and we're still discussing that one.
As part of the evolution to SuperDB we did make multiple changes that affect what's described in this issue. The tl;dr is that these docs accurately describe how the super command currently behaves when compiled from the tip of main, and if anyone has questions or feedback about the new state of things feel free to voice them here or on community Slack.
To offer some detail on some of the specific examples cited earlier in this issue, here's some highlights.
- The changes in #5346 eliminated ambiguity about queries being unexpectedly interpreted as input filenames, since the
-cflag is now used to precede query text. Therefore what was:
$ echo '{a:1,b:2,c:3}' | zq 'yield this' -
{a:1,b:2,c:3}
Now becomes:
$ super -version
Version: v1.18.0-206-g4084e011
$ echo '{a:1,b:2,c:3}' | super -c 'yield this' -
{a:1,b:2,c:3}
- The changes in #5436 eliminated ambiguity about queries intended as searches being unexpectedly treated as something else such as an implied
yield. This is handled by now requiring a leading?or use of thesearchkeyword before search text. Therefore what was:
$ echo '{a:1,b:2,c:3}' | zq '*' -
{a:1,b:2,c:3}
Now becomes:
$ echo '{a:1,b:2,c:3}' | super -c '? *' -
{a:1,b:2,c:3}
- Because the
-cnow makes queries explicit, we kept thenullvalue treatment that allows for easily invoking "calculator" mode (i.e., we did not take up what was described in #5116). Starting from what was noted above as prior behavior:
$ echo '{a:1,b:2,c:3}' | zq 'yield this'
null
Using the super command instead now gives an error message since the lack of explicit -c means this intended query text is now treated as a filename.
$ echo '{a:1,b:2,c:3}' | super 'yield this'
yield this: file does not exist
Once the user spots their mistake, if they add just the -c, now they see the null since the lack of - means they're now ignoring the input and instead are effectively in "calculator mode".
$ echo '{a:1,b:2,c:3}' | super -c 'yield this'
null
Once then once they add the -, they're reading stdin.
$ echo '{a:1,b:2,c:3}' | super -c 'yield this' -
{a:1,b:2,c:3}
As the docs also touch on, the overall direction of the changes is to nudge the tooling more toward behaviors of SQL tools rather than jq. This is part of why we didn't jump all the way to the proposed "more POSIX-friendly implicit stdin input convention", since SQL is historically more explicit about things like inputs (e.g., FROM) and so the remaining shortcuts in SuperDB tools for specifying inputs closely follow these.
Since we've settled pretty firmly on things as they're described here, I'm going to go ahead and close this issue.