jq icon indicating copy to clipboard operation
jq copied to clipboard

Constants (def $foo: EXP;)

Open pkoppstein opened this issue 10 years ago • 36 comments

Sometimes one would like to be able to invoke a constant function multiple times without having to incur the cost of computation more than once. A trivial but perhaps fairly common case is:

def pi: 1|atan *4;

A more important example would be a dictionary that must be constructed.

Of course, there are workarounds, but a simple solution consistent with jq semantics would be to allow a function of arity 0 to be annotated so that once its value has been computed, no further computation is required, e.g.

def final pi: 1|atan * 4;

Two other minimally disruptives options would be (using the 'def pi' example):

let pi: 1|tan*4; # Swift-like

or:

defconst pi: 1|atan * 4;

I realize this could be generalized, but the case for the more generalized version is at best more complex.

pkoppstein avatar Jun 25 '15 15:06 pkoppstein

I like the idea of using "let" for filters that can't use the . argument.

ghost avatar Jun 25 '15 15:06 ghost

I've been thinking that i want something like def $name: EXP; as the syntax for this. With built-ins that can do I/O, like inputs checking that an expression is constant is tricky...

This is for after 1.5.

nicowilliams avatar Jun 25 '15 15:06 nicowilliams

With better constant expression folding we could totally do what @pkoppstein wants, though def $name: ... will yield faster code, and won't require writing code to detect that . is not used. The thing is that we'll have either defer computing the value till first run, run these expressions at compile time, or make constant folding really smart. If i code this I'll go for the latter, i think, but it does seem a bit brittle.

nicowilliams avatar Jun 25 '15 15:06 nicowilliams

@nicowilliams - Your idea of 'def $name' fits in nicely with the existing '--arg/--argfile/--argjson' framework, and I suppose that means compile-time evaluation, though I did have lazy (on-first-invocation) evaluation in mind.

For compile-time evaluation, it would be important to ensure that the --arg/--argfile/--argjson variables be setup first so that the constant-functions can use them.

Perhaps we could have 'def $name' in jq 1.5, with the possibility of a different syntax for lazy evaluation for some future release?

pkoppstein avatar Jun 25 '15 16:06 pkoppstein

Hmmm, if we want --arg* globals to be visible we'd have to either change how they are bound (we'd have to pass along an object containing all those vars to a lot of the functions in compile.c, probably all of them), or we'd have to eval on first reference, which is very hard because you see, constants compile down to a single opcode (LOADK) with an immediate operand (the constant). But the first part doesn't work either because of the order in which bindings take place (right to left): we can't know at constant-fold time whether a symbol reference will be global or not. So this is all rather complex. Constant folding might have to move to block_compile() so that it can work with fully-bound programs. Ay.

Don't expect this any time soon.

nicowilliams avatar Jun 25 '15 16:06 nicowilliams

We could have def $name: <constant exp>; in 1.5, but in 1.5 it could only support simple arithmetic and nothing else (because anything else would require much more work).

nicowilliams avatar Jun 25 '15 16:06 nicowilliams

Moving constant folding to block_compile() will have other benefits, such as allowing one to redefine binary operation builtins (though one should not do this).

nicowilliams avatar Jun 25 '15 16:06 nicowilliams

One other important benefit of deferring constant folding as much as possible is to avoid unnecessary computation, since we drop unreferenced symbols' implementations.

What should happen in a case like def $foo: range(5);? $foo can only have one value (EDIT: either throw an error or take just the first value). Allowing the full range of expressions in the definition of a constant also means we have no choice but to interpret its byte-code to evaluate it the first time. Perhaps the best way to handle this then is to have an opcode like "CALLK" that provides a slot for memoizing a constant. Then we can keep the existing simple constant folding as-is for now.

We'll need to label all C-coded builtins as to whether they are deterministic. Currently we only really have one non-deterministic C-coded builtin: _input.

nicowilliams avatar Jun 25 '15 21:06 nicowilliams

What should happen in a case like def $foo: range(5);

An even worse case is def $foo: empty; since in that case there is no first item in the stream to use as the constant.

From my perspective, though, it's far more important to support the creation of valid constant functions than to flag invalid attempts to create a constant function. In fact, taking @slapresta's observation one step further, I'd suggest that 'let _: ...;' be viewed as a declaration that _ is a constant function. If an invalid function body is specified, then the behavior can be declared to be "undefined".

pkoppstein avatar Jun 26 '15 02:06 pkoppstein

@pkoppstein

If it's a constant value, then the name should be $whatever. Functions, recall, take inputs, so we'd have to be careful that the input isn't actually used. Whereas $foo is always a constant.

let $foo: ...; or def $foo: ...; differs only in a keyword.

How we handle errors is rather important, IMO. Adding obscure error cases is not a good idea.

Clearly, if there's more than one value we can just take the first or error.

If there are no values we can error.

Either way, undefined it won't be.

nicowilliams avatar Jun 26 '15 04:06 nicowilliams

@nicowilliams wrote:

If it's a constant value, then the name should be $whatever.

I'm fine with that, but since (as you say) "functions take inputs, so we'd have to be careful that the input isn't actually used", I think (along with @slapresta), that 'let $foo: ...' would be wise. (We could alternatively adopt the rule that 'let' sets the input to null if you like.)

Clearly, if there's more than one value we can just take the first or error. What you wrote seemed to indicate you wanted an exclusively (?) compile-time analysis.

pkoppstein avatar Jun 26 '15 04:06 pkoppstein

Don't focus on the keyword. That's not important. I like def $foo: ...; just because.

The code that outputs the constant does need an input value, yes. That input value should be null.

I think the only question is how to implement. I'll not think about that for a while, so we're done for now until someone submits a PR.

nicowilliams avatar Jun 26 '15 04:06 nicowilliams

I don't know how to write a better analysis than running the bytecode (but only if the constant is referenced). I think that's fine.

nicowilliams avatar Jun 26 '15 04:06 nicowilliams

I think that's fine.

Yep. Thanks!

pkoppstein avatar Jun 26 '15 04:06 pkoppstein

This would be a very convenient way to implement #820.

nicowilliams avatar Jun 27 '15 02:06 nicowilliams

I think the high-level implementation plan is as follows:

  • for each def $name: EXP; generate a function like def gensym: EXP | error({__jq: {constant: .}});
  • for each reference to a constant def generate a LOADK_OR_CALLK opcode
  • the new opcode will check if the constant is valid, if not it will call the associated function
  • on backtrack the new opcode will raise an error unless we're raising one already, in which case it will catch the error if it is of the above form, and it will set the constant to the unwrapped value and will put a copy on the stack.

The details are another story.

EDIT: This means that multiple outputs (def $foo: range(5);) will be allowed, but only one will ever be produced -- only one will be taken. Not outputting any values (def $foo: empty;) will be an error.

nicowilliams avatar Jun 27 '15 02:06 nicowilliams

Implementation-wise, perhaps just have one new opcode that is like LOADV and differs only in that it skips the immediately following CALL_JQ and STOREV instructions when the var has already been set, else it does not skip them. The call would be to first(EXP), and if EXP produces no values... well, it's OK because we can then have a special value that backtracks (like Icon's &fail, oddly enough). I.e., def $foo: empty; . + $foo` -> nothing, no value, no error.

nicowilliams avatar Jun 27 '15 07:06 nicowilliams

I thought this might be easy enough, but there's tricky bits, so just so you know, there's no chance this will make 1.5.

I fixed one of @pkoppstein's module issues and I think we're ready for 1.5rc2. I plan on making the 1.5rc2 release sometime this week and we'll let it steep for a month.

I still need to finish merging @joelpurra's PRs, I know. They may make 1.5 but not 1.5rc2.

Bang away, especially at the streaming bits.

nicowilliams avatar Jun 27 '15 17:06 nicowilliams

@nicowilliams wrote:

I fixed one of @pkoppstein's module issues

Thank you for restoring the ability to import into the caller's namespace. That's great, but since you seem to have a few moments to spend on jq, I was hoping you'd be able to ponder the still less-than-ideal state of affairs. As I understand things, currently the importer has to choose between:

a) import "library" as whatever; # perfect

b) import "library"; # import into current namespace

c) import "library" as library; # a tad tedious

I've always felt that the module writer should be able to specify a default namespace name; within the current jq framework, that could come from module {"name": _}. If the default namespace name comes from the module declaration, then import "library"; would most naturally mean "use the default".

Thus even if the implementation of a mechanism for supporting a default namespace name is postponed, I think it would be better to adopt some other syntax for importing into the current namespace. My current favorite is import STRING as null.

pkoppstein avatar Jun 27 '15 18:06 pkoppstein

How about include "stuff"; for importing into the namespace and import "stuff"; for importing with the module's basename as the prefix?

nicowilliams avatar Jun 27 '15 19:06 nicowilliams

@pkoppstein I'm out of time in 3, 2, 1, ...

EDIT: I'll build a 1.5rc2, and then wait a while.

nicowilliams avatar Jun 27 '15 20:06 nicowilliams

I want to declare a feature freeze. From here until 1.5 is release, only build and test improvements will go in, module system improvements, and bug fixes.

nicowilliams avatar Jun 27 '15 20:06 nicowilliams

How about include "stuff"; for importing into the namespace and import "stuff"; for importing with the module's basename as the prefix?

Having one's cake and eating it too? Beautiful!

pkoppstein avatar Jun 27 '15 22:06 pkoppstein

aight, that'll be it.

nicowilliams avatar Jun 28 '15 00:06 nicowilliams

@pkoppstein include pushed.

nicowilliams avatar Jul 10 '15 15:07 nicowilliams

$ jq -n -r 'include "Abracadabra"; hi'
Thank you!
Sincerely,
~/.jq/Abracadabra/Abracadabra.jq

pkoppstein avatar Jul 10 '15 16:07 pkoppstein

We should mark builtins like inputs as being non-deterministic and then have the compiler do much more optimizations -- more constant folding. Currently that's hard because the compiler would have to do the same things that the interpreter does to evaluate code, but still. The point is, I don't think I want users to have to ask for the compiler to define constants based on arbitrary expressions; I want the compiler to just do it when it can.

nicowilliams avatar Jan 16 '16 01:01 nicowilliams

And we really should move all bytecode optimizations to work on blocks.

nicowilliams avatar Jan 16 '16 01:01 nicowilliams

My current thinking on this is that the compiler should automatically bracket constant expressions (ones using only constants and only deterministic functions) with two special instructions: one to elide (i.e., jump past) the whole computation if it has been completely memoized, and the other to save an output from the computation. A computation would be completely memoized only when the first instruction is backtracked past. There should be a maximum number of memoized outputs; if too many are produced then memoization is disabled for that expression.

This can take care of constant folding.

One option would be to not do this by default and instead provide const/1 builtin that applies memoization to a given expression (passing it null as its only input). This would apply even to non-deterministic expressions, thus allowing one to have a constant random number, for example (assuming we have a random builtin).

nicowilliams avatar Feb 26 '17 04:02 nicowilliams

This predated as $foo I think

emanuele6 avatar Dec 11 '23 12:12 emanuele6