Precedence of variable binding and concatenation
Hi there,
I'm currently trying Jaq as a thread-safe replacement of libjq. I saw many differences, notably around handling of null values (which is partially fixed in v3.0-alpha). Although, I stumble upon a case which might be a bug.
Given this JQ script:
{
output: [
.array,
.integer as $value | .array | map ($value)
]
}
And this JSON input:
{
"array": [1, 2],
"integer": 42
}
I was expecting:
output[0]to be the input arrayoutput[1]to be the integer in an array having the same size of the input array
But this produce:
{
"output" : [
[
[1, 2],
[1, 2]
],
[42, 42]
]
}
The input array seems to get mapped too.
A simple fix is it to add parenthesis around JQ of output[1]:
{
output: [
.array,
(.integer as $value | .array | map ($value))
]
}
Still, I would like to know if it is normal or a bug ?
For sure, I now that libjq itself can be a bit misleading in some case like with:
{
output: [
.array,
.array | map (. + 1)
]
}
// vs
{
output: [
.array,
(.array | map (. + 1))
]
}
In the first case, both array are mapped and incremented, which is similar to what Jaq is doing.
Also, do you have an exclusive list of expected differences between libjq and Jaq ?
Thanks
Bonjour @TurpIF,
thanks, this is indeed an undocumented divergence between jaq and jq. The precendence of operators in jq is documented at: https://github.com/jqlang/jq/wiki/jq-Language-Description#operators-priority. I will see whether I can correct this with reasonable effort, or whether there is a deeper reason for this divergence. But I think that this will be doable.
Also, do you have an exclusive list of expected differences between libjq and Jaq ?
Have you seen the "Compatibility" sections in the jaq manual? These aspire to be complete.
Is this related to https://github.com/jqlang/jq/pull/3326?
@wader, it is not directly related, but ...
This is a pretty tough nut to crack after all.
On my quest to resolve this issue, I tried to handle | and as $x | as regular binary operators via precedence climbing. For that, I used the precedence table on the jq wiki. That made parsing of the following expression fail with an "undefined variable" error:
1 as $x | 2 | [$x]
Why? Because by the precedences on the jq wiki, as $x | binds stronger than |, so the filter is equivalent to:
(1 as $x | 2) | [$x]
And that is also an error in jq. So I believe that either the precedence table is flawed, or I misinterpret it somehow.
Next, and now we are getting to your issue, @wader, I tried jqjq, and I found that it yields the same result as jq for 1, 2 as $x | 3 | [$x], namely 1 [3], but it yields an error for 1 + 2 as $x | 3 | [$x], whereas jq yields [3]. That is a result of https://github.com/jqlang/jq/pull/3326, because jqjq handles as $x | as a suffix.
I found yet another weird thing: What do you think that the following filter yields?
1, 2 as $x | [$x] | [.]
To sum this up, I'm utterly confused by the precedences of "|", ",", and "as $x |". If somebody can explain their behaviour to me, I can try to make jaq implement it. But without that, I'm not sure whether can do this.
Interesting, and i'm also quite confused 😬, but what version of jq did you test? i get this but maybe you meant the wiki and latest jq is not in sync?
$ jq -cn '1 as $x | 2 | [$x]'
[1]
$ jq --version
jq-1.8.0
hope i will get some time this weekend to try some things out with jqjq using inspirations from https://github.com/01mf02/jaq/pull/370
Next, and now we are getting to your issue, @wader, I tried
jqjq, and I found that it yields the same result asjqfor1, 2 as $x | 3 | [$x], namely1 [3], but it yields an error for1 + 2 as $x | 3 | [$x], whereasjqyields[3]. That is a result of jqlang/jq#3326, because jqjq handlesas $x |as a suffix.
Inspired by your parse "as $x |" via precedence climbing PR i did this https://github.com/wader/jqjq/pull/37 which seems to work quite well but is feels a bit ugly code-wise.
I found yet another weird thing: What do you think that the following filter yields?
1, 2 as $x | [$x] | [.]To sum this up, I'm utterly confused by the precedences of "
|", ",", and "as $x |". If somebody can explain their behaviour to me, I can try to make jaq implement it. But without that, I'm not sure whether can do this.
With the above jqjq PR at least jaq and jqjq agree on this :)
But this feels broken with jq:
$ jq -cn '1, 2 | 3 | [.]'
[3]
[3]
$ jq -cn '1, 2 as $x | 3 | [.]'
1
[3]
🤔
@wader wrote:
But this feels broken with jq
Actually, once one understands the basics of jq's streams and as, the contrasting examples you gave seem very natural.(*) On such very basic and well-established matters, I think it's better to encourage people to develop an understanding and even an appreciation of what might at first "feel broken".
The fable about the Man, his Son and the Donkey comes to mind, and more especially, the moral: "He who tries to please all, pleases none".
In short: If it ain't broke, don't fix it. Or if you prefer: Leave well-enough alone.
(*) The point being that the 2 as $x in the second expression is not an ordinary expression - it causes the entire expression to be evaluated as:
jq -nc '1, (2 as $x | 3 | [.]) '
1
[3]
Aha my bad i was too quick, assumed | and ... as ... | had same precedence but only now noticed in the wiki that as-binop has higher precedence. But i'm still confused, will playaround with the different implementations as see if things gets clearer :)
@wader wrote:
will playaround with the different implementations ...
Implementation is obviously important, but the basic principles are more important. I think you'll find the tricky issues (both w.r.t. specification and implementation) arise mainly because of the infix and prefix operators (+ - * /). In accordance with the leave well-enough alone principle, perhaps at this point it's generally best to emphasize the importance of parentheses for clarity.
@pkoppstein, I would appreciate your insight on this matter. Consider the following jq filter:
1, 2 as $x | 3, 4 | 5 as $y | 6, 7
How would you put the parentheses for this program? What algorithm do you use?
For the current state in jaq, the algorithm is simply precedence climbing, where:
|andas $_ |are right-associative and have precedence 0, and,is left-associative and has precedence 1.
The result of precedence climbing is:
1, 2 as $x | 3, 4 | 5 as $y | 6, 7(original)(1, 2) as $x | (3, 4) | 5 as $y | (6, 7)(handle commas)(1, 2) as $x | ((3, 4) | (5 as $y | (6, 7)))(handle pipes)
What does jq do here?
@01mf02 wrote:
1, 2 as $x | 3, 4 | 5 as $y | 6, 7 How would you put the parentheses for this program? What algorithm do you use?
The key, I think, is to understand that _ as $x is not a jq expression but part of a construct: _ as $x | Y.
In effect, the as causes the _ as $x subexpression to be bound tightly to Y so that p, _ as $x | Y is understood as p, (_ as $x | Y) even if Y is itself a pipeline.
So, to answer your first question explicitly:
[1, 2 as $x | 3, 4 | 5 as $y | 6, 7] == [1, (2 as $x | 3, 4 | (5 as $y | 6, 7))]
(Please note that I'm not implying I would have chosen or would advocate for this semantics, but gojq follows jq's lead, so I think that at least makes it "acceptable".)
Thanks for your input, @pkoppstein. With that, I was able to come up with an algorithm to parse as $x | like in jq: https://github.com/01mf02/jaq/pull/370
However, I'm not sure whether this is a good idea. It's a pretty complex algorithm (making up 4.2% of jaq's overall parsing code!) that has a negative impact on performance. I also believe that teaching users this behaviour is challenging. It's probably also difficult for users to understand it.
I think that it would be better for the language if the precedence rules were simplified such that , has a precedence higher than both | and as $x |. That makes things easier to understand (just regular precedences), easier to teach, faster to parse, and easier to implement.
@01mf02 - obviously the main issue w.r.t. change is backwards compatibility. Perhaps this is s.t. for 2.0? Anyway, I’d suggest asking for @itchyny’s input.
@pkoppstein, yes, I agree. Although, as @wader pointed out, https://github.com/jqlang/jq/pull/3326 already changed the precedence of as $x | in jq 1.7 -> 1.8, so my proposed change might also work for jq 1.8 -> 1.9.
Perhaps to further motivate the proposed change: The current behaviour of jq is not documented AFAIK, and documenting it would probably confuse users quite a bit. In particular, the current behaviour could be described as follows:
Precedences are given by a precedence table. Before that, in every sequence of terms separated by binary operators, every occurrence of
as $x |has to be surrounded with parentheses first, where:
- The left parenthesis extends to the rightmost occurrence of
|or,to the left ofas $x |. If there is no such occurrence, it extends to the beginning of the sequence.- The right parenthesis extends to the end of the sequence.
I believe that this description is correct. However, I think that it is quite difficult to grasp.
@01mf02 - Please note that https://github.com/jqlang/jq/pull/3326 concerns the pseudo-arithmetic operators, for which the parsing rules were long understood to be problematic, so in the context of assessing the significance of backward-compatibility issues, I don't think that that PR is a good precedent for the change now under consideration, which by contrast affects the most basic expressions involving |, , and as.
Regarding existing documentation, it may be worth noting here that the "jq Language Description" (https://github.com/jqlang/jq/wiki/jq-Language-Description) has an authoritative precedence table, and more importantly for the present discussion, there's this:
Data symbol bindings are introduced with expr as $NAME | .... The | is required. The binding is visible to all expressions to the right of the |.
This could reasonably be interpreted to mean that 1, 2 as $x | Y should be understood as 1, (2 as $x | Y)
since anything else would warrant further elaboration. On this, however, I'll gladly defer to @nicowilliams
@pkoppstein, thanks again for your input, especially your quote from the jq language description:
Data symbol bindings are introduced with expr as $NAME | .... The | is required. The binding is visible to all expressions to the right of the |.
This gave me a key insight to how to handle as $x | precedence in a nice way in jaq (and probably also jqjq, @wader), see https://github.com/01mf02/jaq/pull/370/commits/65dbd27ba41c5227ef7cd3ba0572ed92fed7248c.
The idea is to replace all occurrences of ... as $x | ... by ... as $x | (...), where the parentheses to the right encompass all following terms and operators.
Once we have done this, we can use regular precedence climbing. That means also that the precedence table as given in the jq language description is indeed correct.
Precedence issues aside... I've wondered why ... as $x | ... instead of something like $x = ... | .... I think the latter would be very handy because one could then do things like ($x[] |= ...) | ... much as one can (.[] |= ...) | .... If jq is going to have a pretend mutation assignment (it's "pretend" only as far as syntax goes -- new data is produced, and here new bindings would be produced) then why not go all the way?
I agree this is unintuitive precedence of jq. I hope we can fix this in 1.9. I'll fix this in gojq soon and see users' reaction. EDIT: I'll postpone changing in gojq considering the impact of breaking existing scripts.