budlang
budlang copied to clipboard
Add closure support to the virtual machine
As per Discord, the virtual machine lacks some functionality that is needed to support closures. Specifically, there is no way to jump to an instruction and have another instruction jump back to the location jumped from. All instruction offsets are hard-coded and can't come from a variable.
In thinking about how I would want to design this, I want to design closure support with these constraints in mind:
- Pausing execution inside of a closure needs to be resumeable.
- Variable access needs to be protected: a closure should be able to access its own variables, but not be able to access other closure's variables. Closures should be able to access local variables from the function that is invoking the closure.
With these ideas in mind, I am thinking it would be great to give closures their own type of stack frame -- change StackFrame to an enum, and have Function and Closure variants. ValueOrSource would need a new option: ClosureVariable. When resolving a ClosureVariable, the stack can be checked to ensure a closure is executing and that the index is within the allocated region of the closure.
To invoke a closure, a new instruction is needed: CallClosure which takes an instruction offset and number of arguments on the stack.
If needed, the Return behavior can be updated based on whether execution is returning from a closure or from a function.
These are all notes from brainstorming about this idea -- the actual implementation may need to be different than this for reasons that won't be discovered until implementation begins.
- Variable access needs to be protected: a closure should be able to access its own variables, but not be able to access other closure's variables. Closures should be able to access local variables from the function that is invoking the closure.
Slight wording thing:
I think it should say: Closures should be able to access local variables from the function that is defining the closure. (because it could be invoked in a different function).
That's very true, and maybe changes how I would implement this. Perhaps closures should be more well defined than just being a jumped-to-instruction in the Vec<Instruction>. I hadn't thought as far as allowing a closure/function to be stored in a Value, but to allow something like that, we need Closures to be a standalone type.
I hadn't thought as far as allowing a closure/function to be stored in a Value, but to allow something like that, we need Closures to be a standalone type.
well the main usage of closures is higher order functions.
if you don't leave your current scope, normal jumps actually do the trick as well.
They would need to be able to be represented by Value. We could extend it by a "Closure" Variant, that would need to hold
- Instructions (Could be jump points)
- Arguments
- Varisbles of definition context (Stack range)
When you call a Closure it would get:
- stack-range for arguments
- stack-range for its definition context's variables
- stack end for it's own stuff
- Jump to closure beginning
mostly normal execution but
- differentiation of closure variable access and parent scope access
- a return jumps back to call site of closure (generally the same as for functions)
- clear closure variable count from stack
If I'm not missing something this is pretty strait forward.
Edgecase for variable access would be a multilayer closure. (there are two parent layers).
Edgecase for variable access would be a multilayer closure. (there are two parent layers).
this could be done by providing a scope index for outer variables? e.g. 0 would be the direct parent, 1 the grandparent. this could also be used with 0 being self.
This would also necessitate that a closure carries all parent scopes.
This is all great stuff. I definitely hadn't finished thinking through the edge cases. One thing that I realized too is that some of these variable-sized things are things that are static -- known at compile time. This means that some of this data can live on the Closure type itself, and the stack frame only needs to have the unique information for the specific invocation.
Another issue to combat is preventing a closure that captures a local context from being used outside of the local context. Stack range isn't enough, because if you return a closure from a function, then call a new function passing what was returned, you could end up with a valid stack range but it's now for a different function. I can think of several ideas on how to throw an error in these situations, but they all seem ugly or perhaps might impose more overhead than I'd want.