jmespath.site icon indicating copy to clipboard operation
jmespath.site copied to clipboard

JEP 11: The let() Function

Open jamesls opened this issue 10 years ago • 13 comments

The JEP goes into detail about why we need this and how it works. I'll also link compliance tests and a sample implementation to this PR shortly.

cc @mtdowling

jamesls avatar Feb 25 '15 05:02 jamesls

Ok I've linked:

So far everything seems reasonable. Semantics make sense to me. We're borrowing from existing languages where this feature has been around for a really long time so we're using well tested concepts. The python implementation was really straightforward and concise (IMO). I'd imagine other implementations would have the same order of magnitude of code changes to implement this.

I'm curious to hear what others think, but this JEP is growing on me.

jamesls avatar Feb 25 '15 06:02 jamesls

cc @kyleknap @danielgtaylor, if you want to comment, though don't feel obligated.

jamesls avatar Feb 25 '15 06:02 jamesls

This is awesome. While it does complicate JMESPath a bit and requires non-trivial changes to the php, javascript, lua, clojure, ruby, etc... implementations... I think this feature is worth it. :+1:

mtdowling avatar Feb 25 '15 07:02 mtdowling

FWIW, I also have an initial implementation of the javascript version as well (https://github.com/jmespath/jmespath.js/commit/25c22b2192988d11e5fe40795c28281880e4f50e), and I would say the implementation is almost borderline trivial, which is what I like (and surprised me) about this JEP. Same thing for the python implementation. Really all I needed was:

  1. Allow for the runtime/function modules to modify scope. For me, I just needed a way to push/pop scope objects.
  2. Change the lookup process for identifiers. After failing a lookup in the current object, fall back to scope chain lookups.
  3. Have some way to bind the current object when the expref arg is initially evaluated (not when the expression referenced by the expref is evaluated). Because all we see in a function body are the evaluated arguments, we need to know what the current object/context was when the function was called so that the expref has an input object. I didn't previously need this because functions like sort_by, max_by, and min_by evaluated the expref in the context of each individual list element, which was provided as an input argument. Here we need to know what the current object was at the time the let() args are resolved.

There might be a better way to do this, but as far as I can tell, this can handles all the cases I can think of, is pretty easy to follow, and has very minimal runtime overhead. Should also be easy to generalize if other functions or things in the runtime want the ability to create lexical scopes.

jamesls avatar Feb 25 '15 10:02 jamesls

Although I did notice a few interesting scenarios.

First one's not so bad, but consider this:

>>> data = {"foo": {"bar": "baz"}}
>>> jmespath.search('let({qux: `qux`}, &foo.qux)', data)
'qux'

In this scenario we're evaluating foo.qux. foo evaluates to {"bar": "baz"}. So far so good. Next we evaluate the RHS of the sub expression, qux. We see the qux is not defined in the current object {"bar": "baz"} so we look in the scope object and see it's defined as "qux", so we use that value. I can see how some people might find that confusing. Maybe not.

The second one is more interesting. Right now, trying to evaluate an identifier on something that's not a JSON object, results in null:

>>> data = [0, 1, 2, 3, 4]
>>> print jmespath.search("foo", data)
None

However, what about something like this:

>>> data = [0, 1, 2, 3, 4]
>>> print jmespath.search('let({foo: `foo`}, &[0].foo)', data)

What would you expect that to print? The JEP isn't actually clear about this. You could say this is evaluated as:

  1. It's a sub expr, so evaluate the LHS then the RHS.
  2. The LHS is [0], which evaluates to 0.
  3. The right hand side is foo. We lookup foo in 0, which results in null.
  4. If we treat this null as a "failed lookup", we then defer to the scope chain, which has foo set to "foo", so the return value from this function is foo.

On the other hand, you could say trying to evaluate an identifier with an input/current object that's not a JSON object will result in a null and will not fallback to scope chain lookups. This is what I currently have implemented in javascript/python, but I can see how someone might expect this to evaluate to "foo".

jamesls avatar Feb 25 '15 10:02 jamesls

The first case you brought up is interesting. I feel that the behavior you've outlined is confusing and not what I'd expect.

>>> data = {"foo": {"bar": "baz"}}
>>> jmespath.search('let({qux: `qux`}, &foo.qux)', data)
'qux'

In this case, I think that once you've descended into foo, you've entered into a new scope. In this new scope, qux is not defined and should evaluate to null.

mtdowling avatar Mar 07 '15 20:03 mtdowling

We've spoken about this in person, but I wanted to leave feedback here to hopefully help drive this proposal forward. I stated this earlier, but I think that the current behavior of this JEP would be confusing.

I think a better approach would be to allow for variables to be bound to specific scopes, which could then be referenced using a specific sigil. The first sigil that comes to mind for me is $. Perhaps a let function could have the same signature it does now, but instead of adding arbitrary identifiers to the bound expref, it could add named variables. These variables would then be referenced using $name notation, with a var = "$" identifier ABNF grammar rule.

Each scope would have access to the variables bound to that scope and any parent scope, and any variables bound in a specific let function that have the same name as the parent scope would override the parent scope. I don't think there needs to be a specific way to get the value of the same name from a parent scope (you could copy a variable binding to a new name in the child scope to work around this if necessary). Accessing an unbound variable would result in a parse or runtime error (depending on the sophistication of the parser).

Here's an example.

Given {"a": 0}

let({foo: `"bar"`, baz: a}, &
  [$foo, $baz])

Result: ["bar", 0]

What do you think?

mtdowling avatar Apr 12 '15 03:04 mtdowling

I like that better, though we still have to define where exactly these variables are valid. For instance, in my example above I had let({foo: foo}, &[0].foo). The equivalent expression using the $ suggestion would be let({foo: foo}, &[0].$foo), which I wouldn't want to allow. However, I think it's reasonable to support something like $foo.bar. So we'd need updated grammar rules for when you can use these variables. It might be easier to describe this once #2 is done.

As for the specific char, I think my only hesitation was what character we'd use for dereferencing an expref (likely a later JEP). Given the expref char is &, it would have been awesome to use * as the complement, but I'm not entirely sure off hand how feasible that is given the current use of * in the existing grammar. Other than that, I think $ works.

jamesls avatar Apr 12 '15 18:04 jamesls

Awesome. Maybe we can use @ to deref (like clojure)? I think this could be done in a way that does not conflict with the current-node grammar.

I think a variable should be allowed in the same places you would see the current-node token.

mtdowling avatar Apr 12 '15 19:04 mtdowling

Not opposed to it, but my preference is to use a separate token that's not being used yet if possible.

jamesls avatar Apr 14 '15 15:04 jamesls

I was thinking about let expressions recently, and I thought of a potential new syntax that might make it easier to read and work with. It would require built-in support and not utilize functions. We would add a new operator = that is an assignment. This would assign a variable to a named value returned by an expression, and you would then pipe this to expressions that would have the bound scope.

This expression would assign $foo to the current node and pipe that bound variable to an expression.

foo = @ | $foo.bar

We could add destructuring as well:

{foo: bar, baz: bam} = @ | not_null($foo, $baz)

The above expression would assign $foo to @.baz and $baz to @.bam.

We could potentially add list destructuring as well, but I'm not sure if it's necessary:

[foo, bar] = @ | not_null($foo, $bar)

The above expression would take @[0] and assign it to $foo and @[1] to $bar.

Each variable assignment would only be scoped to child subexpressions. So given the following expression, trying to get the value of $foo would be null (or maybe fail?).

[foo = bar | $foo.qux, $foo]


Another option, instead of using = with destructuring, we could use {} => RHS where {} are key value pair bindings exactly how multi-select-hash works, and RHS is the expression to evaluate with those bindings:

{foo: bar, baz: bam} => $foo

The new ast node would be something like Assignment with a LHS and RHS. LHS are the bindings and RHS is the expression to interpret. One advantage of using this over pipe is that we can have a much more optimized tree interpreter that doesn't need to worry about pushing and popping binding frames. If we just have something like an Assignment node, then we know exactly where to push and pope frames.

Finally, I would also recommend that expression references would now close over the variable bindings available when they are created. When executing an expref, you would merge in the current bindings over the closed over bindings.

The => syntax is similar to Scala's anonymous functions: http://docs.scala-lang.org/tutorials/tour/anonymous-function-syntax.html. And C#: https://msdn.microsoft.com/en-us/library/bb397687.aspx. And Hack: https://docs.hhvm.com/hack/lambdas/introduction. And D: https://dlang.org/spec/expression.html#Lambda.

Various other languages use ->. Lots of inspiration and comparisons can be made using this collection of languages: http://rosettacode.org/wiki/Higher-order_functions

mtdowling avatar Jan 19 '16 07:01 mtdowling

I want this!

ghost avatar Sep 27 '19 13:09 ghost

I took a different approach to how this was implemented. Rather than try to implement a stack I just used the call stack for the scope. This required making all visit_* functions accept **kwargs and forward the kwargs to any self.visit() call within those functions. Any time let() is called it creates a new scope dict and merges in any previous scope dict and passes it to the visit() call. I found with your implementation that it was possible for the scope to be popped if a expression ever returns a deferred value (such as a generator). When the generator goes to access the scope stack, that value will have already been popped.

My implementation can be viewed here: https://github.com/brinkmanlab/BioPython-Convert/blob/master/biopython_convert/JMESPathGen.py

Forwarding kwargs also paves the way for future functionality to pass arbitrary values down the expression stack.

A third alternative if the kwargs solution is not satisfactory is to modify the visit_field() function to leverage the inspect library to race up the call stack to the call to _func_let() and pull out a local scope variable. This is demonstrated here: https://stackoverflow.com/a/14694234

innovate-invent avatar Mar 05 '21 01:03 innovate-invent

It looks like this feature is picking up interest again, so to help move this discussion forward, here's my thoughts coming back to this after a while:

I'm mostly convinced that a let() function is the wrong thing to do. We are fundamentally changing how we reference values from the semantics of an implicit "current node" context to this idea of binding and referencing values with scope. Putting this behind a special function that has the ability to change this process would not only be confusing to users (you just have to know that let() is magic) but also a source of errors that would be hard to track down. We've already seen this to some extent in various comments (which identifiers should be looked up in scope &foo.[a, b]), which leads to some incredibly unintuitive behavior with what I originally proposed:

>>> data = {'foo': {'bar': {}}}
>>> # Alternative results depending on if the number of `.bar`'s is even or odd!
>>> jmespath.search('let({bar: foo}, &foo.bar)', data)
{}
>>> jmespath.search('let({bar: foo}, &foo.bar.bar)', data)
{'bar': {}}
>>> jmespath.search('let({bar: foo}, &foo.bar.bar.bar)', data)
{}
>>> jmespath.search('let({bar: foo}, &foo.bar.bar.bar.bar)', data)
{'bar': {}}
>>> jmespath.search('let({bar: foo}, &foo.bar.bar.bar.bar.bar)', data)
{}

So where does that leave us?

Before working through alternative proposals, here's the properties I think the design should have:

  1. There's a syntactic distinction between the existing "current node" lookups and scoped variable lookups.

Rationale: They are fundamentally different types of lookups with different evaluation rules (undefined variables should be an error), so having explicit syntax lets the user clearly state their intent. It also lets implementations more easily optimize variable lookups as we're not touching the existing current node lookup process. I liked the proposal with the $myvar syntax.

  1. We create a new tokens/syntax to indicate we want to evaluate an expression with a given scope. That can be a combination of assignment tokens to denote bindings and/or a token to delimit the start of the evaluated expression, e.g. ->, =>, etc. I'm even open to the possibility of introducing keywords into the language. We'll have to consider the backwards compatibility constraints, but if it makes the expressions easier to read I'm open to it.

Rationale: In addition to simplifying the parser (a starting keyword/token would let a parser immediately know it should parse this as a scope lookup), it also simplifies the evaluation process because it defines exactly where the scope is valid. In looking at approaches other query/expression languages take, reusing an existing token such as | to denote the start of the expression to evaluate as well as a expr | expr pipe expression makes it harder to see which version you're using (is it a let expression or a pipe expression). You also have to know the precedence rules of both to know the expression boundaries.

Updated proposal

So here's an updated proposal to get the ball rolling. The reason we originally proposed the let function was because just about every functional programming language has some let <bindings> in <expr> syntax:

  • Haskell: http://learnyouahaskell.com/syntax-in-functions#let-it-be
  • Clojure: https://clojuredocs.org/clojure.core/let
  • OCaml: https://v2.ocaml.org/manual/expr.html#sss:expr-localdef

So let's just use that syntax. In pseudo-grammar:

lex-expr: let <bindings> in <expr>
bindings: varref = <expr>
        | placeholder for alternate destructuring syntax
varref: $<identifier>

Examples:

Basic usage, binding top level keys:

Expression:

let $newvar = top in foo.{foo: bar, other: $newvar}

Input:

{'foo': {'bar': 'baz'}, 'top': 'top-value'}

Result:

{'foo': 'baz', 'other': 'top-value'}

Chained scope showing root key binding, an inner key binding, and an inline let expression within a multiselect. The let each = @ in $each.z is the same thing as z, but I wanted to show you can use a let inline wherever you could normally use an expression:

Expression:

    let $topkey = top
    in
        bar |
        let $barscope = barscope
        in
            listval[*].[a, let $each = @ in $each.z, $barscope, $topkey.other]


Input:

{
    'foo': {'bar': 'baz'},
    'top': {'other': 'other-value'},
    'bar': {
        'listval': [{'a': 1, 'z': 5}, {'a': 2, 'z': 6}, {'a': 3, 'z': 7}],
        'barscope': 'innervar',
     }
}


Result:

[
  [1, 5, 'innervar', 'other-value'],
  [2, 6, 'innervar', 'other-value'],
  [3, 7, 'innervar', 'other-value']
]

Let me know what you all think. If this seems like the right direction, I can update the JEP and sketch out the python implementation to see if any issues come up.

cc @mtdowling

jamesls avatar Mar 10 '23 17:03 jamesls

@jamesls thanks for updating the proposal !

Am I right thinking that in the new proposal the notion of scopes is gone ?

I think we need to make sure things like let $newvar = top in foo.{foo: bar, other: $newvar.$newvar} is not allowed, basically it should not be a fallback of trying to access a field.

And in your example let each = @ in $each.z shouldn't it be let $each = @ in $each.z ($ is missing when declaring the binding) ?

eddycharly avatar Mar 10 '23 17:03 eddycharly

Am I right thinking that in the new proposal the notion of scopes is gone ?

Scopes are still there. A $var lookup will try to lookup the variable in its current scope, then in its parent scope in succession until there are no scopes left. So in the last example:

    let $topkey = top
    in
        bar |
        let $barscope = barscope
        in
            listval[*].[a, let $each = @ in $each.z, $barscope, $topkey.other]

The $each is pulled in inner most scope (within the multiselect), the $barscope is pulled from the parent scope let $barscope = barscope and the $topkey is pulled from the first let statement in the outer most scope. Similarly, the bindings are only valid within the expression of the let, so if the last line instead was:

    let $topkey = top
    in
        bar |
        let $barscope = barscope
        in
            listval[*].[a, let $each = @ in $each.z, $barscope, $topkey.other] | $each
                                                                                 ^^^^^
                                                                                 |
                                                                                 ---- Invalid, $each doesn't exist anymore                   

you'd get an error because $each doesn't exist anymore.

I think we need to make sure things like let $newvar = top in foo.{foo: bar, other: $newvar.$newvar} is not allowed, basically it should not be a fallback of trying to access a field.

Yep that's why I like the idea of an explicit sigil for variable references, e.g. $foo. The grammar rule would have a new varref = '$' identifier, so $newvar.$newvar wouldn't be allowed by the parser because the sub-expression rule would stil be:

sub-expression    = expression "." ( identifier /
                                     multi-select-list /
                                     multi-select-hash /
                                     function-expression /
                                     "*" )

And in your example let each = @ in $each.z shouldn't it be let $each = @ in $each.z ($ is missing when declaring the binding) ?

Good catch, updated.

jamesls avatar Mar 10 '23 18:03 jamesls

Scopes are still there. A $var lookup will try to lookup the variable in its current scope, then in its parent scope in succession until there are no scopes left. So in the last example:

    let $topkey = top
    in
        bar |
        let $barscope = barscope
        in
            listval[*].[a, let $each = @ in $each.z, $barscope, $topkey.other]

It looks to me that we don't need to chain scopes anymore, at any point a flat binding structure would be enough.

The $each is pulled in inner most scope (within the multiselect), the $barscope is pulled from the parent scope let $barscope = barscope and the $topkey is pulled from the first let statement in the outer most scope. Similarly, the bindings are only valid within the expression of the let, so if the last line instead was:

    let $topkey = top
    in
        bar |
        let $barscope = barscope
        in
            listval[*].[a, let $each = @ in $each.z, $barscope, $topkey.other] | $each
                                                                                 ^^^^^
                                                                                 |
                                                                                 ---- Invalid, $each doesn't exist anymore                   

you'd get an error because $each doesn't exist anymore.

You mean at compile time ? Because we can now validate at compile time a reference to a binding is valid or not ?

Yep that's why I like the idea of an explicit sigil for variable references, e.g. $foo. The grammar rule would have a new varref = '$' identifier, so $newvar.$newvar wouldn't be allowed by the parser because the sub-expression rule would stil be:

sub-expression    = expression "." ( identifier /
                                     multi-select-list /
                                     multi-select-hash /
                                     function-expression /
                                     "*" )

I guess $newvar.$newvar wouldn't translate to:

{
  "type": "Subexpression",
  "children": [
    {
      "type": "Field",
      "name": "$newvar"
    },
    {
      "type": "Field",
      "name": "$newvar"
    }
  ],
  "jmespathType": "Expref"
}

Right ?

eddycharly avatar Mar 10 '23 18:03 eddycharly

It looks to me that we don't need to chain scopes anymore, at any point a flat binding structure would be enough.

Keep in mind you can shadow variables from an outer scope. Here's a somewhat convoluted example that demonstrates the idea. In this example, suppose I'm calling this from some host language where the input data is called data, and the evaluation result is called results (and pretend jmespath has comments with //).

Input:

root = {
    'key': 'rootvalue',
    'subscope1': {
        'key': 'subscope1-value',
        'subscope2': {
            'key': 'subscope2-value'
        }
    }
}

Expression:

let $scope = @                        // $scope is `root`
in [
  $scope.key,                         // <---- results[0]
  $scope.subscope1.[                  // "Current node" changes from sub-expr
    let $scope = @                    // $scope is now `root['subscope1']`
    in [
      $scope.key,                     // <---- results[1]
      $scope.subscope2.[              // "Current node" changes from sub-expr
        let $scope = @                // $scope is now `root['subscope1']['subscope2']
        in $scope.key                 // <---- results[2]
      ],                              // $scope is now back to `root['subscope1']`
      $scope.key                      // <---- results[3]
    ][]                               // $scope is now back to `root`
  ],
  $scope.key                          // <---- results[4]
][][]



Result:

results = ['rootvalue', 'subscope1-value', 'subscope2-value', 'subscope1-value', 'rootvalue']

Notice how every expression in the results list is $scope.key, but depending on where we are in the scope chain, they can evaluate to different values. So we still need the concept of pushing/poping scopes to support this lexical scoping.

You mean at compile time ? Because we can now validate at compile time a reference to a binding is valid or not ?

In theory yes. Several tools let you provide an initial scope as part of the evaluation (e.g. to seed in environment variables when evaluating jmespath from a CLI or in general to pull in data from the outside world), so you wouldn't be able to verify it at compile time, but you could validate it before evaluating the top level expression by collecting the set of free variables in the top level closure and verifying that the initial seed scope binds all the free variables. At any rate, I wouldn't want the spec to require that you fail at compile time for free variables to allow implementations to support this use case. I would like to have a minimum requirement that a runtime failure occurs for any references to variables that don't exist.

I guess $newvar.$newvar wouldn't translate to:

No. I haven't worked out the exact grammar rules, but to translate the pseudo-grammar I initially used into ABNF, roughly:

let-expression: "let" bindings "in" expression
bindings: variable-ref "=" expression
variable-ref: "$" identifier

So there's no expref (that comes from the & char), and a $ isn't a valid starting character for a Field, so you'd never be able to have that node either. Something like $foo.bar would be:

type: sub-expression
children:
  - type: variable-ref
    name: foo
  - type: field
    name: bar

jamesls avatar Mar 10 '23 19:03 jamesls

Keep in mind you can shadow variables from an outer scope. Here's a somewhat convoluted example that demonstrates the idea. In this example, suppose I'm calling this from some host language where the input data is called data, and the evaluation result is called results (and pretend jmespath has comments with //).

Input:

root = {
    'key': 'rootvalue',
    'subscope1': {
        'key': 'subscope1-value',
        'subscope2': {
            'key': 'subscope2-value'
        }
    }
}

Expression:

let $scope = @                        // $scope is `root`
in [
  $scope.key,                         // <---- results[0]
  $scope.subscope1.[                  // "Current node" changes from sub-expr
    let $scope = @                    // $scope is now `root['subscope1']`
    in [
      $scope.key,                     // <---- results[1]
      $scope.subscope2.[              // "Current node" changes from sub-expr
        let $scope = @                // $scope is now `root['subscope1']['subscope2']
        in $scope.key                 // <---- results[2]
      ],                              // $scope is now back to `root['subscope1']`
      $scope.key                      // <---- results[3]
    ][]                               // $scope is now back to `root`
  ],
  $scope.key                          // <---- results[4]
][][]



Result:

results = ['rootvalue', 'subscope1-value', 'subscope2-value', 'subscope1-value', 'rootvalue']

Notice how every expression in the results list is $scope.key, but depending on where we are in the scope chain, they can evaluate to different values. So we still need the concept of pushing/poping scopes to support this lexical scoping.

Still, I don't quite get why a flat structure can make it:

Input:

root = {
    'key': 'rootvalue',
    'subscope1': {
        'key': 'subscope1-value',
        'subscope2': {
            'key': 'subscope2-value'
        }
    }
}

Expression:

let $scope = @                        // bindings = { $scope = `root` }
in [
  $scope.key,                         // <---- results[0]
  $scope.subscope1.[                  // "Current node" changes from sub-expr
    let $scope = @                    // bindings = { $scope = `root['subscope1']` }
    in [
      $scope.key,                     // <---- results[1]
      $scope.subscope2.[              // "Current node" changes from sub-expr
        let $scope = @                // bindings = { $scope = `root['subscope1']['subscope2'] }
        in $scope.key                 // <---- results[2]
      ],                              // bindings are now back to `root['subscope1']`
      $scope.key                      // <---- results[3]
    ][]                               // bindings are now back to `root`
  ],
  $scope.key                          // <---- results[4]
][][]

Result:

results = ['rootvalue', 'subscope1-value', 'subscope2-value', 'subscope1-value', 'rootvalue']

Bindings should be treated as immutable, writing a new key in a binding should not modify it but return a new binding that will be used in all sub expressions.

Current implementations usually look like interpreter.Execute(node, data) (node is the current ast node, data is the input object). We just need to change it to interpreter.Execute(node, data, bindings).

In theory yes. Several tools let you provide an initial scope as part of the evaluation (e.g. to seed in environment variables when evaluating jmespath from a CLI or in general to pull in data from the outside world), so you wouldn't be able to verify it at compile time, but you could validate it before evaluating the top level expression by collecting the set of free variables in the top level closure and verifying that the initial seed scope binds all the free variables. At any rate, I wouldn't want the spec to require that you fail at compile time for free variables to allow implementations to support this use case. I would like to have a minimum requirement that a runtime failure occurs for any references to variables that don't exist.

Ok so you want to allow referencing a binding that hasn't been previously declared ? This is a detail but most languages will fail to compile when referencing an undeclared variable. I guess we can easily support both with different compilation functions though (Compile vs CompileStrict).

No. I haven't worked out the exact grammar rules, but to translate the pseudo-grammar I initially used into ABNF, roughly:

let-expression: "let" bindings "in" expression
bindings: variable-ref "=" expression
variable-ref: "$" identifier

So there's no expref (that comes from the & char), and a $ isn't a valid starting character for a Field, so you'd never be able to have that node either. Something like $foo.bar would be:

type: sub-expression
children:
  - type: variable-ref
    name: foo
  - type: field
    name: bar

Got it, type: variable-ref was my question, resolving a binding becomes a specific ast node, not just a fallback in the implementation of the field node 👍

eddycharly avatar Mar 10 '23 20:03 eddycharly

To illustrate the idea of bindings in a flat structure (map):

// flat:    {}
// chained: `null`
    let $topkey = top
    // flat:    { $topkey = `top` }
    // chained: { $topkey = `top` } -> `null`
    in
        bar |
        let $barscope = barscope
        // flat:    { $topkey = `top`, $barscope = `barscope` }
        // chained: { $barscope = `barscope` } -> { $topkey = `top` } -> `null`
        in
            ....

Every time a let is started we clone the current bindings map, add the new binding (or overwrite an existing one) and this becomes the new bindings map used to evaluate nodes in the in expression. (actually i don't really care about flat vs chained, more about immutability and i would like to avoid a stack based approach, I don't want to push/pop scopes).

eddycharly avatar Mar 10 '23 20:03 eddycharly

Still, I don't quite get why a flat structure can make it:

Ahh, got it, you're asking about implementation details, that's what I was missing. Thought we were still discussing whether or not there will be lexical scoping, which there will be with this proposal. The spec should be careful to avoid requiring a specific implementation, so libraries are free to implement it however they'd like provided all the compliance tests pass. The spec will probably just say "lexical scope" with an explanation similar to this that talks about variables in terms of their visibility/lifetime and avoid any talk of pushing/popping scope. The chained thing was just how I've implemented this type of thing in the past and a common way to implement it. Personally, I'd avoid taking an O(n) copy each time you enter a new scope. You also don't have to mutate anything, you could do scope = makeScope(newbindings, scope) on entering a new scope, and have scope be a struct of type scope struct { bindings map[string]whatever; parent *Scope }, but again implementations are free to handle this however they'd like.

At any rate, I think that's getting ahead of ourselves. Right now, I'd like to focus on whether or not this feature is useful from an end user's perspective and if there's demand for this. I'd rather start with the ideal syntax/semantics for users, and then modify if needed if it'll create implementation difficulties.

jamesls avatar Mar 10 '23 22:03 jamesls

I agree it's more on the implementation side and is out of scope in this discussion.

To me this is an extremely useful feature, allowing to reference parent elements opens doors to more complex/interesting queries. I was a huge fan of the proposal until I started playing with it and discovered the confusing part of it. Now, if we can remove the confusing bits it would be my most wanted feature.

eddycharly avatar Mar 10 '23 22:03 eddycharly

@jamesls thank you very much for weighing in!

I really appreciate your new insight into this feature.

I strongly think lexical scoping is needed although and I think we are all mostly convinced that what brings confusion when using the let() function, is not the let() function itself, but as you rightly pointed out, the scoped-variable lookup.

I must admit I was a bit confused by the inital look into your examples of a let $var in \<expression> syntax but must admit it makes sense overall.

Following your discussion with @eddycharly, I’m coming to the conclusion that JEP-11 as it stands is mostly complete. It would only require a slight update that mandates using a new sigil like $var as proposed here to close the loop of most confusing usages and prevent using scopes as a fallback when evaluating an identifier in the right-hand-side of a sub-expression. @eddycharly that’s what you suggested indeed while coming up with a potential ref() function to explicitly accessing the scope.

That said, I do not dislike this updated proposal, although maybe we should explore alternate tokens as well, as reserved keywords may feel a bit foreign if limited to this only usage.

Scoped variable lookups

For the record, your first example would be possible with an equivalent jep-11 expression, It would work as still exhibiting the undesirable fallback behaviour. So I will use an hypotheticallly updated syntax using the $var syntax as well.

Given:

{"foo": {"bar": "baz"}, "top": "top-value"}

The following two expressions are equivalent:

  • proposal: let $newvar = top in foo.{foo: bar, other: $newvar}
  • jep-11 equivalent: let({newvar: top}, &foo.{foo: bar, other: $newvar})

Given:

{
    "foo": {"bar": "baz"},
    "top": {"other": "other-value"},
    "bar": {
        "listval": [
			{"a": 1, "z": 5},
			{"a": 2, "z": 6},
			{"a": 3, "z": 7}
		],
        "barscope": "innervar"
     }
}

Your proposed expression:

let $topkey = top
    in
        bar |
        let $barscope = barscope
        in
            listval[*].[a, let $each = @ in $each.z, $barscope, $topkey.other]

The jep-11 equivalent expression would be:

let(
  {topkey: top},
  &bar|let(
    {barscope: barscope},
    &listval[*].[a, let({each: @}, &$each.z), $barscope, $topkey.other]))

While the notion of lexical scope will not go away, it would be entirely controlled by the nesting of expressions.

My only qualm with this syntax using reserved keywords is that the in keyword might be confusing. It reads as if $var is taken from the next part of the expression, whereas the full scope is determined unambiguously right before in.

So maybe an alternate keywords would be more intuitive? What about then as in:

  • let $newvar = top then foo.{foo: bar, other: $newvar}

Or a new set of keywords like:

  • with $newvar = top eval foo.{foo: bar, other: $newvar}

New tokens / syntax

Introducing new keywords into the language seems a bit too drastic a change at first. But I am welcoming such a change.

It could pave the way for more simplifications in the future. For instance, this would allow us to abandon the backtick JSON-literals which would be rendered useless, provided we:

Together with multi-select-hash and multi-select-list, emitting JSON would be entirely possible without using backticks at all.

Exploring this proposal with new tokens, I would like to suggest the following epressions for the two examples that you have shown:

  • $newvar := top => foo.{foo: bar, other: $newvar}

Assignment of scope would be done using = or := tokens.

I toyed with the idea of introducing lambda-expression constructs to the language while discussing a potential reduce feature. So the => token could be the remainder of the expression evaluation.

The second example would look like so (and we really need parsers to keep track of line and column numbers to accurately report errors 😁):

$topkey := top => 
  bar |
  $barscope := barscope =>
    listval[*].[
      a,
      $each := @ => $each.z,
      $barscope,
      $topkey.other
    ]

springcomp avatar Mar 11 '23 07:03 springcomp

I think the alternatives thus far far all have in common that referring to a scoped variable should be explicit rather than being the result of a fallback when evaluating an identifier to null.

So as always alternatives fall into two categories, new syntax/tokens, vs new functions.

The updated proposal here introduces new keywords as well which would introduce yet another layer of open potential for JMESPath in the future 🙂.

Scope variable lookup

  • New syntax $var (this proposal)
  • New syntax *var (dereference syntax)
  • New function lookup('var')

A new function would be cumbersome as the identifier needs to be specified as a string. However it would allow dynamically constructing the identifier to lookup from the scopes and open up new possibilities.

Creating scope

  • JEP-11 let({scope: expression}, &expression)
  • Assignment $scope := expression
  • This proposal let $scope = expression

As it stand, I liked introducing a scope in the let() function and I would favor using the same approach in the future when specifying the initial seed value in a potential future reduce() function.

springcomp avatar Mar 11 '23 08:03 springcomp

Another approach that could work without modifying the current grammar:

  • let defines lexical scopes and nothing more
  • create another function to make the scope chain the current context (for example in)
let(
  {topkey: top},
  &bar | let(
    {barscope: barscope},
    &listval[*].[a, let({each: @}, &in(&each.z)), in(&barscope), in(&topkey.other)]))

Basically when inside the in function the current context is the scope chain.

This would allow the creation of isolated scopes that do not inherit from the parent too.

eddycharly avatar Mar 11 '23 10:03 eddycharly

// scope chain: {}
let(
  // scope chain: {root: @} -> {}
  {root: @},
  &in(
    // we enter `in`, scope chain becomes the current context and scope chain is reset to {}
    let(
      // scope chain: {newroot: @} -> {}
      {newroot: @},
      &in(
        // we enter `in`, scope chain becomes the current context and scope chain is reset to {}
        // here the parent scope was not inherited and root.field does not exist, it should be newroot.root.field
        &root.field
      )
    )
  )
)

EDIT:

With the design above I wonder if the scope chain makes sense, the first argument of let produces the new lexical scope and this lexical scope can become the current context by invoking in.

eddycharly avatar Mar 11 '23 11:03 eddycharly

Same thing without the notion of scope chains, just a single lexical scope:

// lexical scope: { foo: "baz" }
// current context: { foo: "bar" }
let(
  // lexical scope becomes: { root: { foo: "bar" }, parent: { foo: "baz" } }
  { root: @, parent: in(&@) },
  &in(
    // we enter `in`, lexical scope is reset to null
    // current context is now: { root: { foo: "bar" }, parent: { foo: "baz" } }
    let(
      // lexical scope becomes: { newroot: { root: { foo: "bar" }, parent: { foo: "baz" } } }
      { newroot: @ },
      &in(
        // we enter `in`, lexical scope is reset to null
        // current context is now: { newroot: { root: { foo: "bar" }, parent: { foo: "baz" } } }
        &[ newroot.root.foo, newroot.parent.foo ]
      )
    )
  )
)

The in function could also be replaced by a lexical_scope function returning the current lexical scope:

// lexical scope: { foo: "baz" }
// current context: { foo: "bar" }
let(
  // lexical scope becomes: { root: { foo: "bar" }, parent: { foo: "baz" } }
  // current context is not modified
  { root: @, parent: lexical_scope() },
  &let(
    // lexical scope becomes: { newroot: { root: { foo: "bar" }, parent: { foo: "baz" } } }
    // current context is not modified
    { newroot: lexical_scope() },
    &lexical_scope() | [ newroot.root.foo, newroot.parent.foo ]
  )
)

eddycharly avatar Mar 11 '23 11:03 eddycharly

@jamesls @springcomp WDYT ? Do we need something more complicated than what I described the above ?

eddycharly avatar Mar 11 '23 16:03 eddycharly

I like the idea of dedicated syntax for this since I think it can make it more readable than frequent usage of “&”, and it doesn’t need us to make a let function special. We’d probably make it the only function that could introduce new scoped variables. I don’t think other functions have arguments that cause side effects for other function arguments either (each argument is isolated and based only on current node).

I like the “$” syntax to access variables too as that removes ambiguity around whether a value is from current node or scoping.

The discussion around new keywords is super interesting. Adding true/false/null keywords and non-backtick numbers would be nice. I’d be concerned about it breaking existing expressions though. Maybe this should be a different discussion than this JEP though.

mtdowling avatar Mar 11 '23 16:03 mtdowling

I don’t think other functions have arguments that cause side effects for other function arguments either

@mtdowling while brainstorming a potential design for 'reduce', we struggled to find a satisfying design. IMHO, a function that takes an object to specify the accumulated identifier and it's seed value is the most elegant solution. Also this would neatly complement the existing map() function.

This would leverage the pattern introduced by let() as initially proposed.

A reduce does have a side effect on its second expression argument. In that case reduce() and let() would share a consistent syntax.

The discussion around new keywords is super interesting. Adding true/false/null keywords and non-backtick numbers would be nice. I’d be concerned about it breaking existing expressions though. Maybe this should be a different discussion than this JEP though.

Of course, this topic is a separate discussion. However we found that cross-pollination of ideas make the overall design more consistent.

springcomp avatar Mar 11 '23 18:03 springcomp