EvalEx icon indicating copy to clipboard operation
EvalEx copied to clipboard

Can arrayJson filtering be supported?

Open lcc0739 opened this issue 1 year ago • 6 comments

Something like the following write-up,

ARRAYFIND(data_mapping_model, item => item.map_type == '3')

loop through the array, then filter by the conditional expression to find every element that returns true。

lcc0739 avatar Apr 15 '24 09:04 lcc0739

Syntax for lambda/closure is not supported but there is a workaround as one discussion thread shows: https://github.com/ezylang/EvalEx/discussions/436 it should be able to do what you want.

stevenylai avatar Apr 17 '24 03:04 stevenylai

What about creating a LAMBDA function (similar to MS Excel)? Something like FILTER(data_mapping_model, LAMBDA(item, item.map_type == '3')) (not implemented).

oswaldobapvicjr avatar Apr 30 '24 21:04 oswaldobapvicjr

@oswaldobapvicjr How would we know that the LAMBDA function first parameter has to be set to the actual array value? The LAMBDA function does not know about its surrounding FILTER function?

uklimaschewski avatar May 05 '24 08:05 uklimaschewski

Maybe it's not so easy (or even possible). But one first idea would be to accept the LAMBDA function as a lazy parameter inside the FILTER function.

The FILTER function would be responsible for iterating through the array and preparing the calls to the LAMBDA function on each iteration, assigning the current element as the actual value of the first token at the LAMBDA function. This way, the LAMBDA function can be agnostic about the surrounding operation.

The idea is to have the LAMBDA function re-usable by other functions such as FILTER, FIND_FIRST, MAP, REDUCE, etc.

oswaldobapvicjr avatar Jul 10 '24 22:07 oswaldobapvicjr

@oswaldobapvicjr then what's the difference between using LAMBDA and the workaround in https://github.com/ezylang/EvalEx/discussions/436 ? Basically map / filter function like MAP(products, quantity, total * quantity) can detect the variable name quantity and then iterate through the array to prepare the necessary variables for the last lazy expression without the need for a new LAMBDA function.

I didn't propose to add this as a new function to the repository because after running through the array, the variable list will have some kind of 'stray' value. In the case of MAP(products, quantity, total * quantity), there will be a new value called quantity which equals to the last element in the array products after evaluation. And I don't see any easy way to overcome this if we are to introduce LAMBDA. But if we all agree this 'stray' value is not an issue, then we can add those functions.

stevenylai avatar Jul 11 '24 06:07 stevenylai

In order to implement those high-order functions without leaving any 'stray' variables after evaluation (in my opinion, 'stray' variables are not only inconvenience but could potentially overwrite other variables if users are not careful), I think we can consider using a temporary map / dataAccessor when the function is iterating through the array performing evaluations one by one.

A modified map function may look like follows (I use map as an example but filter is essentially the same):

@FunctionParameter(name = "array")
@FunctionParameter(name = "placeholder", isLazy = true)
@FunctionParameter(name = "mapper", isLazy = true)
public class MapFunction extends AbstractFunction {
  @Override
  public EvaluationValue evaluate(
      Expression expression, Token functionToken, EvaluationValue... parameterValues)
      throws EvaluationException {
    List<EvaluationValue> array = parameterValues[0].getArrayValue();
    String placeHolder = parameterValues[1].getExpressionNode().getToken().getValue();
    ASTNode mapper = parameterValues[2].getExpressionNode();

    List<EvaluationValue> mapped = new ArrayList<>();
    DataAccessorIfc tmp = expression.getConfiguration().getDataAccessorSupplier().get();  // get a tmp dataAccessor
    for (EvaluationValue value : array) {
      tmp.setValue(placeHolder, value);
      mapped.add(expression.evaluateSubtree(mapper, tmp));
    }
    return EvaluationValue.arrayValue(mapped);
  }
}

Here we need to add a new method for Expression: EvaluationValue evaluateSubtree(ASTNode startNode, DataAccessorIfc variables). This new API will pass in the variables to the subsequent call to an updated getVariableOrConstant(Token token, DataAccessorIfc variables) where the variable resolution will have the following precedence:

  1. Resolve from the variable parameter (new)
  2. Resolve from this.constants (existing)
  3. Resolve from this.dataAccessor (existing)

There are some other concerns though:

  1. Depending on the implementation (user may override this), dataAccessorSupplier may returns the same underlying DataAccessorIfc (i.e. user is reusing the same variable set for multiple expressions) where they will still see those stray variables in their variable table. Or worse, if they are not careful when using those map / filter functions and defined a placeholder variable name which collides with another one, then the other variable will be overwritten.
  2. The new evaluateSubtree will need to be public and may make the API more complex and less clean. And also from an API consistency's perspective, if we allow user to optionally pass in a DataAccessorIfc during evaluateSubtree, it would make more sense that they should be allowed to do the same for evaluate method.
  3. Once we have such APIs where user can pass in an 'extra variable table', things may become even more complex when the user wants to do something like the following:
     var expression = new Expression("MAP(products, quantity, total * quantity)");
     var dataAccessor = expression.getConfiguration().getDataAccessorSupplier().get();   // create dataAccessor on the fly
     dataAccessor.setValue("product", List.of(1, 2, 3));
     expression.evaluate(dataAccessor);  // Oops, "product" is not set in this.dataAccessor. Perhaps dataAccessor should go all the way into the AbstractFunction.evaluate()?
    

stevenylai avatar Jul 13 '24 03:07 stevenylai