Co-expressions
@wader made me initially aware of @nicowilliams's co-expressions PR. I'm not sure yet whether I understand it correctly. But if what follows is not what that PR is about, I think that it would still be interesting on its own sake.
So what I understand is that a co-expression allows you to capture all outputs of an expression into a variable, such that each further occasion of that variable yields the next output of the expression. For example:
(1, 2, 3) as $@s | ($@s, $@s)
would yield 1, 2.
(I just made up a bit of syntax. I think that @nicowilliams uses just @s to denote a "co-variable", but this might cause conflicts with the escaping syntax @uri "foo\(.)bla". Let's not bikeshed for now, that can be done later. ^^)
I was quite excited to realise that you can implement zip/2 quite easily with such a construct, which you cannot do with current jq:
def zip($@x; $@y): def rec: $@x as $x | $@y as $y | [$x, $y]; rec;
(I suppose that if the expression that has been captured by a co-variable $@x does not yield any more outputs, then $@x simply yields empty.)
Furthermore, this would allow us to implement inputs / input very naturally.
In particular, just like $ENV is currently bound on the top-level in every jq program, we could bind $@input on the toplevel, which would allow us to write:
def inputs: $@input | ., inputs;
def input : $@input;
And now for something completely different: foreach. I found that we can even simulate that with co-expressions:
foreach xs as $x (init; update; project) ===
init as $init | xs as $@x | $init |
def rec: label $exit | ifempty($@x; break $exit) as $x | update($x) | (project($x), rec); rec
Here, isempty(f; g) is a bit like f // g, except that it calls g only if f is empty, otherwise it returns all outputs of f.
Of course, this version of foreach is not as good performance-wise as the built-in foreach, e.g. because the label breaks tail recursion and so on. I'm not proposing here to replace the built-in foreach by what is here. But it is at least a very good sign that you can simulate built-in operators with this approach which you could not even closely simulate before.
All in all, I am pretty excited about this. Even if it would definitely change the characteristics of jq towards a language with side effects, I think that the gained power could be worth it. I also think that I could implement this in my jq implementation jaq with quite little effort.
If somebody would extract the relevant parts of @nicowilliams's PR and make a smaller PR out of it that contains only to co-expression functionality, then I think I would be motivated to go ahead and implement this in jaq.
What do you think?
@01mf02 I'm glad you like it. I don't have the energy or time to finish co-expressions, but having those in jq would really give jq a boost. If you'd like to do it for jaq first, or only for jaq, as long as we agree on syntax and semantics I think that would be fine.
@nicowilliams, I understand! Given that you have invested so much work in this, I would be very curious about what you think about my proposed semantics. Does what I describe above look reasonable to you? Also, what do you think about the syntax?
P.S.: And all the best to you to recover energy / time! :)
Are the semantics all in this issue? If so,
So what I understand is that a co-expression allows you to capture all outputs of an expression into a variable, such that each further occasion of that variable yields the next output of the expression.
Well, it captures the state of an expression without evaluating it fully first. This can seem like details, but it's important details since the co-expression might not finish -- think of an infinite list in Haskell w/ lazy evaluation.
It basically forces the implementation to do something like use threads (green perhaps) or have multiple VMs. Naturally I chose to make sure that the jq_state machinery in jq is re-entrant so I can invoke a jq VM from inside another, then all I had to do was make sure that co-expression VMs get cleaned up when the co-expression goes out of scope.
BTW, regarding input and inputs... The following is a total tangent. Just a stream of thought.
Icon has a number of expressions and functions which are generators but which generate one output with a side-effect, then on resumption they undo the side-effect and generate one last output or else backtrack. This is very useful for parsing stuff, but also it tends to make the language more referentially transparent, and thus more functional.
jq does not have a lot of side-effecting functions, but input and inputs very much have side-effects: namely they change what inputs will be seen in the future. Now a side-effect-undoing inputs is not really plausible because we'd have to buffer all of its outputs, but a side-effect-undoing version of input is possible because we only have to buffer one output.
One of the things I'd like to maybe figure out how to do is to add something like Icon's "string scanning context" concept. In Icon the string scanning context consists of two dynamic variables (think Lisp dynamics) &subject (typically a string) and &pos (typically an integer index into &subject). In jq I think we could add such a thing, but if I had to do Icon-style parsing w/o adding dynamics to jq I'd use a . that has structure like: {s: <the subject>, p: <position>, t: <last token output>}, then I'd have Icon-style functions to try one thing, then undo it. Except I think this breaks because in Icon a function ("procedure") was not a single expression but a bunch of statements.
As to syntax, the reason I used @<name> was only really that Icon does that too :)
I understand the conflict with @<format> and am a bit sad about it. Options to deal with that include (bikeshed alert):
$@<name>@@<name>@{<name>}- reserve
@<name>for names of formats jq has (ok, this is probably difficult)
In particular, just like $ENV is currently bound on the top-level in every jq program, we could bind $@input on the toplevel, which would allow us to write:
Yes, quite. In jq one can always override builtins this way, though one should not override _... builtins.
Well, it captures the state of an expression without evaluating it fully first. This can seem like details, but it's important details since the co-expression might not finish -- think of an infinite list in Haskell w/ lazy evaluation.
Yes, I'm aware of that.
It basically forces the implementation to do something like use threads (green perhaps) or have multiple VMs. Naturally I chose to make sure that the
jq_statemachinery in jq is re-entrant so I can invoke a jq VM from inside another, then all I had to do was make sure that co-expression VMs get cleaned up when the co-expression goes out of scope.
At this point, I'm more interested in how the language is supposed to behave, rather than how the jq implementation does it. That's because in jaq, due to its very different approach to jq program execution, I will not need to any kind of VM or threads to have co-expressions. My plan is to just generalise the machinery I created for inputs.
To understand the supposed behaviour of co-expressions, it is crucial for me to have an answer to my initial question in this issue. To address your point that co-expressions might not finish, let me slightly adapt the question to the following:
Would range(infinite) as $@s | ($@s, $@s) yield 0, 1?
If you could answer this question, that would help me a lot.
As to syntax, the reason I used
@<name>was only really that Icon does that too :)I understand the conflict with
@<format>and am a bit sad about it. Options to deal with that include (bikeshed alert):
$@<name>@@<name>>@{<name>}- reserve
@<name>for names of formats jq has (ok, this is probably difficult)
I think that co-variables should start with $, since all other variable-like things (variables, labels) also start with $.
Furthermore, I think that co-variables should be syntactically distinguishable from regular variables, in order to prevent confusion when $x, $x suddenly yields two different outputs because $x is a co-variable. So that's why I'm leaning still towards $@x. The fact that $@x is longer than $x is also a good thing, IMO, because co-variables probably should be used more sparingly than regular variables.
Or what do you think about $&x? For me, & evokes a reference to something, whereas @ somehow evokes more of a label for me.
In my perfect jq world, I would have probably introduced something like f as $x | ..., label @x | ..., and f as &x. That way, there would not be confusion between the different kinds of bindings. Ah well, we have to make do with what we have. 🤷
Would
range(infinite) as $@s | ($@s, $@s)yield0, 1?
Yes, of course. The co-expression would retain its state between invocations.
Notice though that co-expressions are not values. You can't store them in arrays or objects, for example. You can't quite pass them as values either, but fortunately jq function arguments are expressions, not values, so that works out.
Today $... are immutable lexical bindings for values -- constant-like. So $... for co-expressions might be confusing given that they are not values and their state mutates when invoked? But then having it be $@... may well suffice on that score, and since that is just bikeshedding, if you ship this feature first then it can be your way :)
In my perfect jq world, I would have probably introduced something like
f as $x | ...,label @x | ..., andf as &x. That way, there would not be confusion between the different kinds of bindings. Ah well, we have to make do with what we have. 🤷
In a perfect world I might have used $x = ... | instead of ... as $x |... We'd have to get a hold of @stedolan and pick his brain.
However, I was inspired by Icon's syntax for creating co-expressions.
Would
range(infinite) as $@s | ($@s, $@s)yield0, 1?Yes, of course. The co-expression would retain its state between invocations.
Thanks for confirming!
Notice though that co-expressions are not values. You can't store them in arrays or objects, for example. You can't quite pass them as values either, but fortunately jq function arguments are expressions, not values, so that works out.
We're on the same page here.
Today
$...are immutable lexical bindings for values -- constant-like. So$...for co-expressions might be confusing given that they are not values and their state mutates when invoked? But then having it be$@...may well suffice on that score, and since that is just bikeshedding, if you ship this feature first then it can be your way :)
We just have to teach users that $@... captures a generator (AKA iterator), not a value. It might still take some time for people to wrap their heads around this, but hey, I heard that jq has this effect on people in general. ;)
Thanks for confirming!
They wouldn't be co-routines otherwise!
Notice though that co-expressions are not values. You can't store them in arrays or objects, for example. You can't quite pass them as values either, but fortunately jq function arguments are expressions, not values, so that works out.
We're on the same page here.
And just to state the obvious: they are not values for the same sorts of reasons that, while closures are values in Lisps, Lisps invariably cannot reliably (or at all) print closures and read them back, but here it's worse because jq's type system is JSON's, and JSON decidedly does not have such types of values as runnable code!
Today
$...are immutable lexical bindings for values -- constant-like. So$...for co-expressions might be confusing given that they are not values and their state mutates when invoked? But then having it be$@...may well suffice on that score, and since that is just bikeshedding, if you ship this feature first then it can be your way :)We just have to teach users that
$@...captures a generator (AKA iterator), not a value. It might still take some time for people to wrap their heads around this, but hey, I heard that jq has this effect on people in general. ;)
😆 it sure does! When it comes to bikeshedding, once you're past all the blocking issues then the first one to release wins. An issue and wiki page here to document what you've done so we can stay compatible would be highly appreciated :)