jaq icon indicating copy to clipboard operation
jaq copied to clipboard

Precedence of variable binding and concatenation

Open TurpIF opened this issue 1 month ago • 19 comments

Hi there,

I'm currently trying Jaq as a thread-safe replacement of libjq. I saw many differences, notably around handling of null values (which is partially fixed in v3.0-alpha). Although, I stumble upon a case which might be a bug.

Given this JQ script:

{
  output: [
    .array,
    .integer as $value | .array | map ($value)
  ]
}

And this JSON input:

{
  "array": [1, 2],
  "integer": 42
}

I was expecting:

  • output[0] to be the input array
  • output[1] to be the integer in an array having the same size of the input array

But this produce:

{
  "output" : [
    [
      [1, 2],
      [1, 2]
    ],
    [42, 42]
  ]
}

The input array seems to get mapped too. A simple fix is it to add parenthesis around JQ of output[1]:

{
  output: [
    .array,
    (.integer as $value | .array | map ($value))
  ]
}

Still, I would like to know if it is normal or a bug ?

For sure, I now that libjq itself can be a bit misleading in some case like with:

{
  output: [
    .array,
    .array | map (. + 1)
  ]
}

// vs

{
  output: [
    .array,
    (.array | map (. + 1))
  ]
}

In the first case, both array are mapped and incremented, which is similar to what Jaq is doing.

Also, do you have an exclusive list of expected differences between libjq and Jaq ?

Thanks

TurpIF avatar Nov 13 '25 09:11 TurpIF

Bonjour @TurpIF,

thanks, this is indeed an undocumented divergence between jaq and jq. The precendence of operators in jq is documented at: https://github.com/jqlang/jq/wiki/jq-Language-Description#operators-priority. I will see whether I can correct this with reasonable effort, or whether there is a deeper reason for this divergence. But I think that this will be doable.

Also, do you have an exclusive list of expected differences between libjq and Jaq ?

Have you seen the "Compatibility" sections in the jaq manual? These aspire to be complete.

01mf02 avatar Nov 13 '25 10:11 01mf02

Is this related to https://github.com/jqlang/jq/pull/3326?

wader avatar Nov 13 '25 12:11 wader

@wader, it is not directly related, but ...

This is a pretty tough nut to crack after all. On my quest to resolve this issue, I tried to handle | and as $x | as regular binary operators via precedence climbing. For that, I used the precedence table on the jq wiki. That made parsing of the following expression fail with an "undefined variable" error:

1 as $x | 2 | [$x]

Why? Because by the precedences on the jq wiki, as $x | binds stronger than |, so the filter is equivalent to:

(1 as $x | 2) | [$x]

And that is also an error in jq. So I believe that either the precedence table is flawed, or I misinterpret it somehow.

Next, and now we are getting to your issue, @wader, I tried jqjq, and I found that it yields the same result as jq for 1, 2 as $x | 3 | [$x], namely 1 [3], but it yields an error for 1 + 2 as $x | 3 | [$x], whereas jq yields [3]. That is a result of https://github.com/jqlang/jq/pull/3326, because jqjq handles as $x | as a suffix.

I found yet another weird thing: What do you think that the following filter yields?

1, 2 as $x | [$x] | [.]

It yields `1 [[2]]`! That means that this filter is equivalent to `1, (2 as $x | ([$x] | [.]))`. However, `1, 2 | [.]` yields `[1] [2]`, which means that the precedence of `,` seems to change somehow.

To sum this up, I'm utterly confused by the precedences of "|", ",", and "as $x |". If somebody can explain their behaviour to me, I can try to make jaq implement it. But without that, I'm not sure whether can do this.

01mf02 avatar Nov 14 '25 10:11 01mf02

Interesting, and i'm also quite confused 😬, but what version of jq did you test? i get this but maybe you meant the wiki and latest jq is not in sync?

$ jq -cn '1 as $x | 2 | [$x]'
[1]
$ jq --version
jq-1.8.0

hope i will get some time this weekend to try some things out with jqjq using inspirations from https://github.com/01mf02/jaq/pull/370

wader avatar Nov 15 '25 10:11 wader

Next, and now we are getting to your issue, @wader, I tried jqjq, and I found that it yields the same result as jq for 1, 2 as $x | 3 | [$x], namely 1 [3], but it yields an error for 1 + 2 as $x | 3 | [$x], whereas jq yields [3]. That is a result of jqlang/jq#3326, because jqjq handles as $x | as a suffix.

Inspired by your parse "as $x |" via precedence climbing PR i did this https://github.com/wader/jqjq/pull/37 which seems to work quite well but is feels a bit ugly code-wise.

I found yet another weird thing: What do you think that the following filter yields?

1, 2 as $x | [$x] | [.]

To sum this up, I'm utterly confused by the precedences of "|", ",", and "as $x |". If somebody can explain their behaviour to me, I can try to make jaq implement it. But without that, I'm not sure whether can do this.

With the above jqjq PR at least jaq and jqjq agree on this :)

But this feels broken with jq:

$ jq -cn '1, 2 | 3 | [.]'
[3]
[3]
$ jq -cn '1, 2 as $x | 3 | [.]'
1
[3]

🤔

wader avatar Nov 15 '25 18:11 wader

@wader wrote:

But this feels broken with jq

Actually, once one understands the basics of jq's streams and as, the contrasting examples you gave seem very natural.(*) On such very basic and well-established matters, I think it's better to encourage people to develop an understanding and even an appreciation of what might at first "feel broken".

The fable about the Man, his Son and the Donkey comes to mind, and more especially, the moral: "He who tries to please all, pleases none".

In short: If it ain't broke, don't fix it. Or if you prefer: Leave well-enough alone.

(*) The point being that the 2 as $x in the second expression is not an ordinary expression - it causes the entire expression to be evaluated as:

jq -nc '1, (2 as $x | 3 | [.]) '
1
[3]

pkoppstein avatar Nov 16 '25 09:11 pkoppstein

Aha my bad i was too quick, assumed | and ... as ... | had same precedence but only now noticed in the wiki that as-binop has higher precedence. But i'm still confused, will playaround with the different implementations as see if things gets clearer :)

wader avatar Nov 16 '25 15:11 wader

@wader wrote:

will playaround with the different implementations ...

Implementation is obviously important, but the basic principles are more important. I think you'll find the tricky issues (both w.r.t. specification and implementation) arise mainly because of the infix and prefix operators (+ - * /). In accordance with the leave well-enough alone principle, perhaps at this point it's generally best to emphasize the importance of parentheses for clarity.

pkoppstein avatar Nov 16 '25 23:11 pkoppstein

@pkoppstein, I would appreciate your insight on this matter. Consider the following jq filter:

1, 2 as $x | 3, 4 | 5 as $y | 6, 7

How would you put the parentheses for this program? What algorithm do you use?

01mf02 avatar Nov 17 '25 09:11 01mf02

For the current state in jaq, the algorithm is simply precedence climbing, where:

  • | and as $_ | are right-associative and have precedence 0, and
  • , is left-associative and has precedence 1.

The result of precedence climbing is:

  1. 1, 2 as $x | 3, 4 | 5 as $y | 6, 7 (original)
  2. (1, 2) as $x | (3, 4) | 5 as $y | (6, 7) (handle commas)
  3. (1, 2) as $x | ((3, 4) | (5 as $y | (6, 7))) (handle pipes)

What does jq do here?

01mf02 avatar Nov 17 '25 09:11 01mf02

@01mf02 wrote:

1, 2 as $x | 3, 4 | 5 as $y | 6, 7 How would you put the parentheses for this program? What algorithm do you use?

The key, I think, is to understand that _ as $x is not a jq expression but part of a construct: _ as $x | Y.

In effect, the as causes the _ as $x subexpression to be bound tightly to Y so that p, _ as $x | Y is understood as p, (_ as $x | Y) even if Y is itself a pipeline.

So, to answer your first question explicitly: [1, 2 as $x | 3, 4 | 5 as $y | 6, 7] == [1, (2 as $x | 3, 4 | (5 as $y | 6, 7))]

(Please note that I'm not implying I would have chosen or would advocate for this semantics, but gojq follows jq's lead, so I think that at least makes it "acceptable".)

pkoppstein avatar Nov 17 '25 10:11 pkoppstein

Thanks for your input, @pkoppstein. With that, I was able to come up with an algorithm to parse as $x | like in jq: https://github.com/01mf02/jaq/pull/370

However, I'm not sure whether this is a good idea. It's a pretty complex algorithm (making up 4.2% of jaq's overall parsing code!) that has a negative impact on performance. I also believe that teaching users this behaviour is challenging. It's probably also difficult for users to understand it.

I think that it would be better for the language if the precedence rules were simplified such that , has a precedence higher than both | and as $x |. That makes things easier to understand (just regular precedences), easier to teach, faster to parse, and easier to implement.

01mf02 avatar Nov 17 '25 13:11 01mf02

@01mf02 - obviously the main issue w.r.t. change is backwards compatibility. Perhaps this is s.t. for 2.0? Anyway, I’d suggest asking for @itchyny’s input.

pkoppstein avatar Nov 17 '25 13:11 pkoppstein

@pkoppstein, yes, I agree. Although, as @wader pointed out, https://github.com/jqlang/jq/pull/3326 already changed the precedence of as $x | in jq 1.7 -> 1.8, so my proposed change might also work for jq 1.8 -> 1.9.

01mf02 avatar Nov 17 '25 15:11 01mf02

Perhaps to further motivate the proposed change: The current behaviour of jq is not documented AFAIK, and documenting it would probably confuse users quite a bit. In particular, the current behaviour could be described as follows:

Precedences are given by a precedence table. Before that, in every sequence of terms separated by binary operators, every occurrence of as $x | has to be surrounded with parentheses first, where:

  • The left parenthesis extends to the rightmost occurrence of | or , to the left of as $x |. If there is no such occurrence, it extends to the beginning of the sequence.
  • The right parenthesis extends to the end of the sequence.

I believe that this description is correct. However, I think that it is quite difficult to grasp.

01mf02 avatar Nov 18 '25 08:11 01mf02

@01mf02 - Please note that https://github.com/jqlang/jq/pull/3326 concerns the pseudo-arithmetic operators, for which the parsing rules were long understood to be problematic, so in the context of assessing the significance of backward-compatibility issues, I don't think that that PR is a good precedent for the change now under consideration, which by contrast affects the most basic expressions involving |, , and as.

Regarding existing documentation, it may be worth noting here that the "jq Language Description" (https://github.com/jqlang/jq/wiki/jq-Language-Description) has an authoritative precedence table, and more importantly for the present discussion, there's this:

Data symbol bindings are introduced with expr as $NAME | .... The | is required. The binding is visible to all expressions to the right of the |.

This could reasonably be interpreted to mean that 1, 2 as $x | Y should be understood as 1, (2 as $x | Y) since anything else would warrant further elaboration. On this, however, I'll gladly defer to @nicowilliams

pkoppstein avatar Nov 18 '25 10:11 pkoppstein

@pkoppstein, thanks again for your input, especially your quote from the jq language description:

Data symbol bindings are introduced with expr as $NAME | .... The | is required. The binding is visible to all expressions to the right of the |.

This gave me a key insight to how to handle as $x | precedence in a nice way in jaq (and probably also jqjq, @wader), see https://github.com/01mf02/jaq/pull/370/commits/65dbd27ba41c5227ef7cd3ba0572ed92fed7248c. The idea is to replace all occurrences of ... as $x | ... by ... as $x | (...), where the parentheses to the right encompass all following terms and operators. Once we have done this, we can use regular precedence climbing. That means also that the precedence table as given in the jq language description is indeed correct.

01mf02 avatar Nov 18 '25 15:11 01mf02

Precedence issues aside... I've wondered why ... as $x | ... instead of something like $x = ... | .... I think the latter would be very handy because one could then do things like ($x[] |= ...) | ... much as one can (.[] |= ...) | .... If jq is going to have a pretend mutation assignment (it's "pretend" only as far as syntax goes -- new data is produced, and here new bindings would be produced) then why not go all the way?

nicowilliams avatar Nov 18 '25 16:11 nicowilliams

I agree this is unintuitive precedence of jq. I hope we can fix this in 1.9. I'll fix this in gojq soon and see users' reaction. EDIT: I'll postpone changing in gojq considering the impact of breaking existing scripts.

itchyny avatar Nov 24 '25 11:11 itchyny