assemblyscript icon indicating copy to clipboard operation
assemblyscript copied to clipboard

Implement closures

Open MaxGraey opened this issue 6 years ago • 94 comments

I decide to start discussion about simple closure implementations.

Some obvious (and may be naive) implementation is using generation class context which I prefer to demonstrate in following example:

declare function externalCall(a: i32, b: i32): void;

function range(a: i32, b: i32, fn: (n: i32) => void): void {
  if (a < b) {
    fn(a);
    range(a + 1, b, fn);
  }
}

export function test(n: i32): void {
  range(0, n, (i: i32) => {
    externalCall(i, n); // capture n
  });
}

which transform to:


// generated
class ClosureContext {
  fn: (ctx: usize, i: i32) => void;
  n: i32; // captured var
  // optinal "self: usize;" is closure instantiate inside instance class method
  parent: ClosureContext | null = null;
}

// generated
function lambdaFn(ctx: usize, i: i32): void {
  externalCall(i, changetype<ClosureContext>(ctx).n);
}

function range(a: i32, b: i32, ctx: usize): void {
  if (a < b) {
    changetype<ClosureContext>(ctx).fn(ctx, a); // replaced from "fn(a)";
    range(a + 1, b, ctx);
  }
}

export function test(n: i32): void {
  // insert
  let ctx = new ClosureContext();
  ctx.fn = lambdaFn;
  ctx.n = n;
  //
  range(0, n, changetype<usize>(ctx));
}

Closure and ClosureContext will not generated when no one variable was captured and use usual anonym functions. ClosureContext will not created when only this was captured. In this case ctx param use to pass this reference.

Other discussions: #563

@dcodeIO @jtenner @willemneal Let me know what you think about this?

MaxGraey avatar Aug 28 '19 17:08 MaxGraey

I think we need to override the meaning of call indirect.

jtenner avatar Aug 28 '19 18:08 jtenner

How would this transform?

function add(a, b): callback {
  return () => a + b;
}

jtenner avatar Aug 28 '19 18:08 jtenner

@jtenner


class ClosureContext {
  fn: (ctx: usize) => i32;
  a: i32;
  b: i32;
  parent: ClosureContext | null = null;
}

function lambdaAdd(ctx: usize): i32 {
  return changetype<ClosureContext>(ctx).a + changetype<ClosureContext>(ctx).b;
}

function add(a: i32, b: i32): ClosureContext {
  let ctx = new ClosureContext();
  ctx.fn = lambdaAdd;
  ctx.a = a;
  ctx.b = b;
  return ctx;
}

MaxGraey avatar Aug 28 '19 18:08 MaxGraey

Instead

class ClosureContext {
  fn: (ctx: usize) => i32;
  b: i32;
  ...
}

we could just store index for indirect function table

class ClosureContext {
  fnIdx: usize;
  a: i32;
  ...
}

... 

call_indirect(ctx.fnIdx, ...args)

MaxGraey avatar Aug 28 '19 18:08 MaxGraey

How does this work with function dispatch?

For instance, let's say I pass the Closure Context out to js as a pointer. How can I re-call it?

jtenner avatar Aug 28 '19 18:08 jtenner

Is there a way to make a table and add entries to it?

jtenner avatar Aug 28 '19 18:08 jtenner

@jtenner actually you need pass only fn / fnIdx. After that just use same approach as for anonymus indirect function calls.

EDIT No you need unpack fn from returned ClosureContext object. Just provide another util for loader

MaxGraey avatar Aug 28 '19 18:08 MaxGraey

Next example:

function test(fn: (x: i32) => void): i32 {
  let n = 0;
  fn(x => { n = x });
  return n;
}

should generate:

class ClosureContext {
  fn: (ctx: usize, x: i32) => void;
  n: i32;
  parent: ClosureContext | null = null;
}

function lambdaFn(ctx: usize, x: i32): void {
   changetype<ClosureContext>(ctx).n = x;
}

function test(fn: (x: i32) => void): i32 {
  let n = 0;
  let ctx = new ClosureContext();
  ctx.fn = lambdaFn;
  ctx.n = n;
  fn(changetype<usize>(ctx));
  n = ctx.n;
  return n;
}

MaxGraey avatar Aug 28 '19 19:08 MaxGraey

Well I'm thinking about aspect collecting function pointers. How will aspect need to work with the function pointers?

jtenner avatar Aug 28 '19 20:08 jtenner

My guess is that passing around the closure context will cause problems with manually using call_indirect like aspect does.

Also, this closure method doesn't handle multiple references to the same local.

let a = 1;
let b = () => a;
let c = () => a += 1;

B and c will get different versions of a.

jtenner avatar Aug 28 '19 20:08 jtenner

@jtenner You could get table and get function by index / ref. For example you have wasm:

(func $foo (result i32) (i32.const 1234))
(table (export "tbl") anyfunc (elem $foo))

than you js part:

WebAssembly.instantiateStreaming(fetch('main.wasm')).then(({ instance }) => {
  const table = instance.exports.tbl;
  console.log(table.get(0)());  // 1234
});

MaxGraey avatar Aug 28 '19 20:08 MaxGraey

It's likely we will need to allocate enclosed local values in a table or on the heap. Each pointer to those values will need to be stored in a table pointing to the heap.

This idea is Naive because the variables can no longer be treated like local variables because it's possible to modify local values before the function finishes executing.

jtenner avatar Aug 28 '19 20:08 jtenner

B and c will get different versions of a.

Why?

let a = 1;
let b = () => a;
let c = () => a += 1;

let br = b(); // 1
let cr = c(); // 2
// assert(a == 2)

Will generate:


class ClosureContextB { fn; a; }
class ClosureContextC { fn; a; }

function lambdaB(ctx: usize): i32 {
   return changetype<ClosureContextB>(ctx).a;
}

function lambdaC(ctx: usize): i32 {
   return changetype<ClosureContextC>(ctx).a += 1;
}

let a = 1;

let ctxB = new ClosureContextB(lambdaB, a);
let br = b(ctxB); // 1
// a = ctxB.a; // 1 unmodified so unnecessary

let ctxC = new ClosureContextB(lambdaC, a);
let cr = c(ctxC); // 2
a = ctxC.a; // 2

MaxGraey avatar Aug 28 '19 20:08 MaxGraey

I'm saying a in that example needs to exist on the heap for both b and c to access it, and the closure class needs to contain a Box<i32> that points to the heap location.

jtenner avatar Aug 28 '19 21:08 jtenner

No, we don't need pass plain types by boxed references. Also found pretty clear article. So ClosureContext (LexicalEnvironment) should be little bit modifier and also store reference to it's parent LexicalEnvironment.

MaxGraey avatar Aug 28 '19 21:08 MaxGraey

Regarding collection of closure contexts: Seems the idea here is to pass around a closure context (containing both the function index and the lexical scope) instead of just a function index. While that can be reference counted, it leads to a situation where something can be called with either a function index or a closure context, for example

function callIt(fn: () => void): void { fn(); }

function nop(): void {}
callIt(nop); // function index

let a: i32;
function ctx(): void { a = 1; }
callIt(ctx); // closure context

which means we'd either have to generate two versions of callIt (one taking a function index, one taking a closure context, otherwise doing the same) or doing only closure contexts, requiring every call to callIt to wrap function indexes in a temporary empty closure, which is an unnecessary allocation that is free'd immediately.

A better approach might be to utilize the multi-value spec, in that a closure is actually two values, a function index and a lexical environment, with the latter possibly being null.

dcodeIO avatar Aug 28 '19 21:08 dcodeIO

Yep. The issue I'm going to hit with a multivalue return is when these function pointers need to be utilized in JavaScript.

For instance, I want to call a describe function pointer that is nested. In order to do this from AssemblyScript, I need to export a function callIt that has 0 knowledge of lexical scope.

Edit: Could we have a primitive like lexscopeof(func)? That way I could at least manage it externally in javascript

jtenner avatar Aug 28 '19 21:08 jtenner

multi-value approach is definitely better but I'm not sure standalone VMs will support it. Its quite complicated even for compiler tools. (binaryen still not fully impement it). So having fallback (without multi-values support) it seems necessary even if not so efficient and require some allocation in heap

MaxGraey avatar Aug 28 '19 21:08 MaxGraey

Maybe one way to work around the allocation is to keep a singleton closure context per non-closure around in static memory. Like, if we know that the table has max 200 elements, make 200 dummies pre-populated with the function index and no lexical scope? Hmm

dcodeIO avatar Aug 28 '19 22:08 dcodeIO

Well at least 200 might not be enough. I can imagine taking advantage of this feature in aspect in very terrifying ways

jtenner avatar Aug 28 '19 22:08 jtenner

Firstly, I'd like to follow up on Daniel's example, would something like this work?

type context<T> = T extends Context | T extends Function
function callIt<context<T>>(fn: T): returnof<T> { //This function would need to change.
  if (isFunction<T>()){
    if isVoid<T>() {
     fn(); 
     return;
    }
    return fn();
  } 
  let ctx = changetype<Context>(fn);
  if (ctx.isVoid){ // Need to add a isVoid property.
     ctx.call()
     return
  }
  return ctx.call();
}

I read this article a year ago about how Elm handles first class functions in wasm: https://dev.to/briancarroll/elm-functions-in-webassembly-50ak

The big take away is that a function context is a function pointer, its current arity (or how many arguments it still has to take) and an ArrayBuffer of the arguments that have been passed.

When you have new function: let f = new Func(fn) and you the do f.call(..) You pass it the last parameter and you get a new function context which now takes one less argument. This continues until you have one argument left at which point the function pointer of the context is called. In the context of the article above, everything is immutable, which is why you get a clone of the context + the new argument. This way the intermediate functions that return functions of a smaller arity can be reused.

let add = (a: i32, b: 32): i32 => a + b;
let addOne = add(1); ==> this is now a function  `a: i32 => a + 1`;
let addTwo = add(2);
let two = addOne(1);
let three = addTwo(1);

This could look something like this:

class Func<Fn> {
   // fn: Fn; //function to be called when all arguments are present
   // airity: usize; // current airity

  get length(): usize {
    return lengthof<Fn>();
  }

  get currentArg(): usize {
    return this.length - this.arity;
  }
  
  constructor(public fn: Fn, public arity: usize, public args: ArrayBuffer){}

   static create<Fn>(fn: Fn, arity: usize = lengthof<Fn>(), args: new ArrayBuffer(sizeof<u64>() * lengthof<Fn>()): {
    //This isn't exactly right size the size of each parameter could vary. But for sake of simplicity let's assume they are all usize.
     //  Let's assume there is a builtin `paramType<Fn>(0)`
    type argType = paramType<Fn>(0);
    let func;
    if (arity > 1) ? paramType<Fn>(lengthof<Fn>() - arity + 1) : returnof<Fn>();
    let func = new Func<argType, Fn>(fn, arity, args);
    return func;
   }
 
  call<T>(arg: T): Func<Fn> | returnof<Fn>() {
    assert(arg instanceof paramType<Fn>(this.currentArg());
    if (arity == 0) { // arg is the last one and we can call the function 
      this.fn(...this.args); //This clearly needs to be a compiler operation to load the arguments as locals.  Or transform all paremeter references with offsets and pass the array/arraybuffer.
   }
   let args = this.args.clone();
   store<T>(args.dataStart + this.currentArg(), arg)
   return new Func<Fn>(fn, this.arity - 1, args);
  }
}

I know a lot of this isn't currently possible, just wanted to get the point across.

willemneal avatar Aug 30 '19 18:08 willemneal

Potential implementation idea:

  • A function reference is represented by a function table index currently.
  • Closures would be special function references with an attached context, means: managed allocations

Problem: The same function can be called with either a function table index or a closure pointer. Solution: We know that allocations have an alignment of 16 bytes, so if we make all function table indexes with that alignment invalid (essentially skipping every 16th function table index, making it the null function), a callee can check whether it is dealing with a function table index or a closure pointer by means of:

if (fn & 15) {
  call_indirect(fn, ...);
} else {
  ctx = __retain(fn);
  call_indirect(ctx.index, ...);
  __release(ctx);
}

Here, ctx is a global referencing the current closure context that must be set before a closure is being executed. A closure context is a class-like managed structure of the form:

ClosureContext extends HEADER {
  index: i32;
  closedOverVariable1: Type1;
  ...
  closedOverVariableN: TypeN;
}

Compilation:

When encountering a closed-over variable

  1. Remember that the current function is a closure
  2. For each such variable, lay these out as of ClosureContext layout in advance
  3. Replace what would be a local.get or local.set of each such variable with a load respectively store relative to current ctx.

When compiling a call to such a function

  1. Allocate the closure context and assign it to ctx
  2. Copy the function index, then each closed-over local's value to the closure context
  3. Call the function normally
  4. Copy the local values back to the caller's locals, or context if it is itself a closure
  5. Release the closure context

Performance impact:

  • Each closure call implies an allocation, retain and release, while non-closure calls do not
  • Each indirect call implies a (fairly predictable) branch checking for function table index vs closure.
  • Function references become refcounted only if !(val & 15) (means: is a closure pointer)

What am I missing? :)

dcodeIO avatar Nov 22 '19 22:11 dcodeIO

Isn't this approach cause to table fragmentation? I predict closures can be much much more often than ordinal indirect functions

MaxGraey avatar Nov 22 '19 23:11 MaxGraey

It'd cause the table to be 6,25% larger than it needs to be (every 16th element is the null function), but this is independent of how many closures there are since the function index of the closure is stored normally into the table in the other 93,75%, and the closure pointer itself doesn't live in the table but in memory.

Let's say we have 20 functions in the table, with the first 10 being normal functions and the latter 10 being closures (just for the sake of this example), then the table looks like:

Index Element
0 $null
1 $normal1
2 $normal2
...
10 $normal10
11 $closure1.index
12 $closure2.index
...
15 $closure5.index
16 $null
17 $closure6.index
18 $closure7.index
...
21 $closure10.index

So, at the expense of a 6.25% larger table, we gain the ability to tell from the value representing a function (for example what Array#map is called with) if it is a normal function we can just call_indirect ((fn & 15) != 0) or a pointer to a closure in memory that we first have to unwrap ((fn & 15) == 0, use ctx.index for the call_indirect) and to refcount.

For example, if Array#map is being called with fn = 16, its code will see that this is not a valid function table index but an address in memory (to a closure), yielding code like

if (fn & 15) {
  // this is a normal function index, i.e. $normal5
  call_indirect(fn, ...);
} else {
  // this is a memory address to a closure, i.e. $closure3
  ctx = changetype<ClosureContext>(fn);
  call_indirect(ctx.index, ...);
}

with the lifetime of each closure reference being tracked by the variables retaining a reference to that closure.

function outer() {
  var a: i32 = 0;
  return function inner() { return a++; };
}
var theClosure = outer(); // closure lives as long as there is at least one ref to it

dcodeIO avatar Nov 23 '19 06:11 dcodeIO

Btw the most simplest optimization for closures are Lambda Lifting which looks like:

function outer(n) {
  var x = 5;
  function inner(y) {
    return x + y;
  }
  return inner(n);
}

could be transform via lifting to simple pure function

function inner(y, x) {
  return x + y;
}

function outer(n) {
  var x = 5;
  return inner(n, x);
}

and later could be optimized by binaryen to

function inner(y) {
  return 5 + y;
}

function outer(n) {
  return inner(n);
}

and finally

function outer(n) {
  return 5 + n;
}

MaxGraey avatar Nov 24 '19 22:11 MaxGraey

Turns out that a clean way to represent closures isn't quite enough yet. There are various corner cases with phis (closure in one branch), loops (modifying a local before we know it's captured), multiple closures in the same scope (can't alias to a single memory address) and nested closures (propagation of changes through closure contexts). Need to think about this more.

dcodeIO avatar Nov 26 '19 12:11 dcodeIO

Appears that the foremost building block we need there is another pass that finds out about captured locals before actually compiling the function. This way, we can allocate a single shared scope upfront when compiling a function that contains one or more closures, and make the function itself use that scope for the locals captured by any closure. A function reference would then consist of a function table index and a reference to its scope, with the function references keeping the scope alive - i.e., one level of indirection.

dcodeIO avatar Nov 26 '19 21:11 dcodeIO

I'm sorry to intervene but @MaxGraey asked if I had any ideas about implementing closures on WASM couple days ago ('coz I was thinking about them too).

@dcodeIO could you please elaborate on your last idea? "A function reference would then consist of a function table index and a reference to its scope" - when we're returning or passing around a closure, what scope we're going to pass along? The closure's scope itself or parent function's scope? And then what if a closure has another closure inside it?

It gets pretty hairy pretty quick! :smile:

If we're going to push this idea further I think we'll need a linked list of scopes to accommodate nested closures. And then something akin to de Bruijn indexes to lookup variables in the right scope. Pretty much like you do in your regular lambda-calculus interpreter (or Lisp interpreter) with lexical scoping and a linked list of environments.

On the other hand, if you're going to represent closures as a "fat pointers" (table index in this case + a pointer to environment) you can get away with just per-closure environment capturing all necessary variables from all parent scopes. Basically, the same scheme as @MaxGraey suggested initially. I'm just not sure how acceptable that from JS-interop point of view.

I hope I've said something helpful and sorry again for interfering! :)

gabriel-fallen avatar Nov 29 '19 12:11 gabriel-fallen

you can get away with just per-closure environment capturing all necessary variables from all parent scopes

That's the latest I had on my mind, yeah. As an example, if we have

function outer() {
  var a = 0;
  function inner() {
    var b = 0;
    a++;
    return function innerInner() {
      return (a++) + (b++);
    }
  }
  return inner;
}

there will be one lexical environment on the heap, the one containing a in outer being extended with b in inner (there can be multiple bs in different scopes, each with its own memory address).

Now, the values representing inner and innerInner would be pointers to refcounted heap allocations containing the function table index and a pointer to that environment, which we can distinguish from a plain non-closure function table index by having an alignment of 16 bytes, so these are interchangeable and any function taking a function reference can deal with both kinds by means of a potentially well-predicted if.

GC-wise, the heap allocations for each function reference would keep the environment, which is itself a refcounted heap allocation, alive, so as soon as there is no more reference to innerInner or inner, the environment would become collected. Until then, the environment keeps its contents alive.

While this potentially wastes some memory if a closure that uses an extended var is collected before another one that doesn't capture the same var, it allows us to map every capture to one exact memory location across all related closures instead of having to synchronize values between multiple environments, or manage a tree or list of sorts which would probably be costly, so seems worth it. All functions accessing any capture would replace their use of the respective local with loads and stores on the environment, even outer, though it doesn't need a function reference for itself.

Ofc there are various optimization opportunities here, with functions not capturing anything remaining plain function table indexes, or non-sharing/unrelated closures being simplified to a single heap allocation consisting of its function table index and the environment in one slab, which both should be very common in real world code.

The one level of indirection can be absorbed upon unwrapping the function index pre-call by storing the (direct) pointer to the environment in a global that the called function will use as a base for its loads and stores.

Regarding JS interop, we gain the ability to pass opaque function references around in JS (like from an export to an import), no matter if these are plain function indexes or a pointer to the managed function reference, because all receivers can deal with both. Only if JS wants to know exactly what the function index is, it needs to unwrap it if !(ref & 15). Furthermore, since we do not insert any additional parameters to closure signatures, as suggested before if a capture is read only, there is no magic happening that would break a call if not taken care of.

Still not 100% certain about this, though, since two heap allocations in the worst case seem odd. Hope that makes sense :)

dcodeIO avatar Nov 29 '19 13:11 dcodeIO

@dcodeIO yeah, I read the discussion and got your idea about exploiting address alignment to discriminate between plain indexes and closures. Neat trick. :wink:

What I suspect (I'm not sure though) you can get rid of one level of indirection, at least get read of allocating GC'd closure structure on the heap, if you always use "fat pointers" for closures and function pointers. Your "fat pointer" would consist of a table ID and a pointer to environment. Kinda "unpacked closure structure" if you use two parameters for that. Or a "packed closure structure" if we can put them both into single i64 I guess?

The downside is, basically, all the functions that get called indirectly have to take an extra parameter - a pointer to environment, even if they close nothing and thus don't need one. On the other hand, you can pass at least i32 or f32 in place of unused environment pointer I guess?

I haven't thought it through but on the surface you don't waste your table elements, especially for the programs without closures...

gabriel-fallen avatar Nov 29 '19 14:11 gabriel-fallen