FastExpressionCompiler icon indicating copy to clipboard operation
FastExpressionCompiler copied to clipboard

Data-oriented representation of Expression partly on stack and optimized for constant collection

Open dadhi opened this issue 8 months ago • 2 comments

Imagine having a compact and serializable Expression representation by using a SOA (structure of arrays):


public struct Expr
{
    public int RootLambdaIndex;
    
    // using ImTools.SmallList4 will keep the first 4 items on stack

    public SmallList4<LambdaItem> Lambdas; // the root lambda is referred by index
    public SmallList4<object> Constants;
    public SmallList4<ParamOrVarEntry> ParamsAndVars;
    public SmallList4<MethodLikeItem> CtorsMethodsInvokes;
    
    public SmallList4<BinaryEntry> Binaries;
    public SmallList4<UnaryEntry> Unaries;
    
    // ... the rest non-conformists
}

// The lambda is repsented as a slice of `LambdaItem`, 0 element in slice denoting the ParamCount. 
// The next ParamCount items refer to `ParamsAndVars` by index in `ParamCountOrParamIndex`, 
// the `ReturnTypeOrNullForParam` being null (@perf how to optimize it even more?)
public struct LambdaItem
{
    public int ParamCountOrParamIndex;
    public Type ReturnTypeOrNullForParam;
}

public struct ParamOrVarEntry
{
    public Type ParamType;
    public bool IsByRef;

    // For now Name is a string, but it may be even a slice referencing to the external continuos text of all strings, 
    // including all Param and Var names.
    public string NameMaybe;
}

public struct MethodLikeItem
{
    public MethodBase Method; // ConstructorInfo, MethodInfo, etc.
    // Instead of enum and tranlastion to the corresponing list Expr, it may be the offest inside Expr struct, or both the enum with the offset value
    public ExprKind ParamKind;
    public int ParamIndex;
}

// Can be constructed like this:
Expr expr = default;
expr.Lambda<Func<int, Foo>>(
    expr.New(typeof(Foo).GetConstructors()[0], 
        Constant("hey, "), 
        expr.Parameter(typeof(int), "n", same: 0), 
        Constant("sistah", putIntoClosure: true)),
    expr.Parameter(same: 0)
);


Lambda<int, Foo> lambda = expr.CompileFastToLambda();
var fooSis = lambda.Invoke(31);

lambda.Closure.SetConstant(0, "brotha");
var fooBro = lambda.Invoke(42);

// Where the Lambda might be
public struct Lambda<T1, R> 
{
    public Func<ArrayClosure, T1, R> Func;
    public FriendlyClosure Closure;
    public R Invoke<T1>(T1 t1) => Func(FriendlyClosure.ArrayClosure, t1); 
}

The benefits:

  1. Less memory consumed overall.
  2. Small expressions are fully on stack.
  3. Using the indexes to refer to the children expressions makes it transferrable and persistent over the network, benefits the De/Serialization making it almost no-op.
  4. Data is grouped and serialized by the ExpressionType simplifies the processing of the specific ExpressionType, e.g. collecting the Constants does not require the full expression traversal, nor the collecting of the nested Lambdas or the nested Blocks.

The cons:

  1. Not compatible with System Expression API and requires the conversion.

Update - 2025.04.22

  • Expression forest as a single struct of "array"
  • Use precise capacity for "arrays" for the expression with all known sub-expressions, number of constants, calls, etc.

dadhi avatar Apr 04 '25 09:04 dadhi

Sounds like an exciting idea. When compiling expressions, rarely do you actually care about the class instances used as building blocks, and you know ahead of what parts you need to construct the expression with.

ovska avatar Apr 06 '25 18:04 ovska

@ovska Yes, usually an expression is an intermediary to be composed and immediately compiled into the delegate.

dadhi avatar Apr 06 '25 18:04 dadhi