jq icon indicating copy to clipboard operation
jq copied to clipboard

feature request: eval

Open pkoppstein opened this issue 11 years ago • 23 comments

Currently, a jq expression can be conveniently represented as a JSON string in a JSON object, but there does not appear to be an "eval" filter to evaluate such a string when presented as data. Such functionality is essential for a JSON-based integrity-assurance tool that I would like to implement with jq.

For example, the constraint that "votes" should be an array of objects, each of which must have an "id" element, could be expressed as follows:

{"votes": "type == \"array\" and ([ .[] | type==\"object\" and has(\"id\") ] | all)" }

Since jq already reads, parses and evaluates such strings, I'm hoping that adding eval (or eval(_)) won't be too difficult.

pkoppstein avatar Jun 06 '14 04:06 pkoppstein

My plan is to have several I/O handle types. stdin would refer to stdin, and so on, natch (these symbolic names are really functions that output something like a file descriptor number, so under the hood they're just some JSON value naming a stream). But then one could even have streams that correspond to other jq VMs started by compiling a jq program (e.g., with a function named compile). This would allow for eval, of course, but also co-routines.

nicowilliams avatar Jun 06 '14 07:06 nicowilliams

If you look through various issues and through the branches of my jq clone... you'll see what I mean. I spent some time experimenting with how best to do I/O and ended up with the yield abstraction as a result (so I could have a read function that reads one thing but iteratively call it from jq code (jq, not C) to produce streams. I ran out of time, so it's not ready for 1.4, and my mid-cleanup branch may not be in my github clone. Anyways, I hope to find time for this after 1.4 ships and then maybe ship 1.5 sooner. I can see that there's a pent-up need for these features...

nicowilliams avatar Jun 06 '14 07:06 nicowilliams

Unfortunately I don't understand the connection you have in mind between eval(_) and I/O handle types, but I'd happily settle for a filter that takes a string as its argument, and interprets it as a jq program relative to its input -- that is, an efficient version of something like the following, assuming "system(_)" were available:

def eval(s): system( " jq -M '" + s + "'" );

pkoppstein avatar Jun 06 '14 22:06 pkoppstein

Let's say that one way to implement eval (and co-routines) would be to start a new jq virtual machine and communicate libjq jv values between the invoking VM and the VM running the eval'ed program (or co-routine). jq(1)'s I/O is not something that the VM is aware of, since it deals only in libjq jv values -- it is main() that takes care of reading from stdin, producing jv values, then feeding them to the jq VM, then encoding the outputs and printing them on stdout. One can see communicating VMs as "reading" and "writing" to each other. And if eval is a subset of the co-routine case...

nicowilliams avatar Jun 06 '14 22:06 nicowilliams

I see no reason for this eval to be separated in another VM... it should just apply its parameter filter on its input, compile every string result from it (which will typically be one result) as a a new filter, then apply that to its input again. All in the same VM, no new type of I/O, no nothing. You do not need to pass any new type of data between filters or as part of the intermediate or final streams.

georgir avatar Jun 12 '14 17:06 georgir

@georgir This is all implementation details. I will first finish the yield and I/O stuff, and then we'll see.

nicowilliams avatar Jun 12 '14 20:06 nicowilliams

@georgir I should add that using a separate VM would be easy to implement, but it would add some complications: each thing eval'ed would be in a brand-new environment -- making it inherit loaded libraries, preserving defs across evals, these would be difficult. On the other hand, generating new bytecode for a running VM presents its own difficulties. Another distinction is one set of bytecode, N VMs, or N VMs and N sets of bytecode, or one VM and one set of bytecode? Implementation details are not as interesting as semantics though, and at this stage I'd rather talk about semantics.

Some possible semantics:

  • one eval'ed thing can create defs available to the next (desirable for a REPL!)
  • one eval'ed thing can set bindings (variables) for the next (also desirable for a REPL)
  • each eval'ed thing can produce multiple outputs, just like any jq program, but can they be consumed in an alternating manner? restatement: can jq get past depth-first searching and backtracking and do breadth-first as well?

In the Icon programming language (and in Prolog) the search process is depth-first. But Icon also has "co-expressions" (co-routines with light-weight syntax) for doing breadth-first searching. In Scheme one can use the current continuation to similar effect. The point is, once we have a way to read one item from a given stream, and many streams to read from, we can have the same concept for jq. Looping back to implementation details, such semantics inexorably lead to a stack per-coroutine -or to having every invocation frame on a heap-; clearly a VM per-eval has a lot of potential.

Anyways, first things first.

nicowilliams avatar Jun 12 '14 21:06 nicowilliams

@georgir - Thanks for your interest and insights. I agree that we should put the desired semantics in the driver's seat, at least at this stage.

For the "use case" I have in mind (see above), it would be essential to have an eval that does NOT carry any baggage from its calling environment.

On the other hand, it would be inconvenient for the string-argument of eval to have to include all the relevant function definitions explicitly. Fortunately, @nicowilliams is already addressing the I/O issue, so once that is ready, it should be easy to support eval(_;_) so that we could write:

.filter as $filter | .context as $context | eval($filter; $context) 

where $context is a URL or name or pathname of a file defining jq functions (or modules :-).

Alternatively (or in addition), jq could support some kind of "require" or "import" functionality, e.g.

echo '{"data": 1, "jq": "require \"http://modules.jq/org/mymodule\"; foo"}  | jq '.data | eval(.jq)'

One advantage of the latter is that it sidesteps the problem of multiple arities, but I would not mind having to pass null as the second argument of eval if there is no context.

pkoppstein avatar Jun 14 '14 06:06 pkoppstein

Yes, a way to sandbox eval is imoortant, and the I/O work helps that. But it'd also be useful to have a way to inherit defs and bindings from the caller: for a REPL. A REPL that loses all state between input commands is kinda useless :). We need both.

nicowilliams avatar Jun 14 '14 07:06 nicowilliams

@nicowilliams wrote:

We need both.

So how about:

eval( STRING, null ) ~~ vanilla context
eval( STRING, true) ~~ caller's context
eval( STRING, CONTEXT) ~~ context specified by CONTEXT (a filename, filepath, URL, module, ...) 

pkoppstein avatar Jun 14 '14 17:06 pkoppstein

Reifying parts of the eval caller's context -reflection- seems ETOOHARD, though so did TCO and it turned out to be easy and a big win. I'd rather avoid it for now, using explicit contexts instead:

  • eval (no arguments) should eval the input program in a null context with only stdin/out/err and normal builtins
  • eval({}) and eval(null) should eval the input program in a null context with no I/O handles and only normal builtins
  • eval(context) should eval the input program in the given context

A context should be an object with:

  • variable definitions (like the command-line --arg option, basically)
  • file handles
  • libraries/modules to import as if they were builtins

Later, if we want to allow the eval'ed program to see all bindings visible to the eval caller we can just add something like eval(true) to mean "inherit/pass through/use all of the caller's context".

nicowilliams avatar Jul 08 '14 23:07 nicowilliams

re: eval(context)

I do not think an object can contain anything other than actual normal json-able values, nor do I think it is a good idea to make it possible. So file handles, libraries, modules, filters... are out as far as I'm concerned.

Also, filters do not really receive value arguments, but other filters. Which might return multipe values. So what should eval do if context returns multiple values?

So no, I do not think passing it a context at all is needed or makes sense. I think input value is all that is needed.

I'm a bit too scared of feature creep, and unneeded complexity. But if you find a way to make it more robust and still making sense, I might reconsider.

georgir avatar Jul 09 '14 00:07 georgir

@georgir File handles being like what they are in, say, POSIX. So not a first-class type but a name index into an implied set of open handles. That's OK. The file handle stuff is on its way (but by default jq programs will be as sandboxed as today).

nicowilliams avatar Jul 09 '14 00:07 nicowilliams

More specifically, I imagined the interface as something to the effect of eval(program), using the current input as the input for the evaled program. Not having the current defs and vars is acceptable.

42 | eval(".+1") => 43

georgir avatar Jul 09 '14 00:07 georgir

Or eval(".+1"; 42) => 43

georgir avatar Jul 09 '14 00:07 georgir

@georgir Oh, hmm, yes, that's right, eval should apply an argument program to its inputs.

nicowilliams avatar Jul 09 '14 00:07 nicowilliams

To summarize, as I understand it, the general form of eval will be

eval(STRING; CONTEXT)

where STRING is a string that will be compiled in a context specified by CONTEXT.

I would suggest that we envision a filter named system and define a set of jq flags (even if they are never implemented) so that we can say that the semantics of eval(STRING,CONTEXT) is that of system("jq ARGS 'S'), where

  • S is a suitably escaped version of STRING, and
  • ARGS is a set of command line options that depends on CONTEXT.

For example, assuming -q means "do not read ~/.jq" we could write:

eval(STRING) ~~ system("jq -q -M 'S') 

pkoppstein avatar Jul 09 '14 02:07 pkoppstein

I would like this feature as well. +1 !

jmatsushita avatar May 26 '15 16:05 jmatsushita

The c-coded-generators branch of my clone has the basis for C-coded generators, which is what's needed to make eval possible.

nicowilliams avatar May 26 '15 22:05 nicowilliams

@nicowilliams Great to hear!

jmatsushita avatar May 27 '15 10:05 jmatsushita

I want to know what's the status on this issue. It's definitely not a must-have feature, but having it would be great!

crides avatar Jul 23 '19 15:07 crides

+1, would love to have this

traut avatar Apr 16 '24 14:04 traut

If someone wants to play around with how eval could work in jq then jqjq has some basic eval support:

$ ./jqjq -n '{a: 1} | eval(".b = 2") | eval(".b, .c.d") += 3'
{
  "a": 1,
  "b": 5,
  "c": {
    "d": 3
  }
}

wader avatar Apr 16 '24 14:04 wader