calyx
calyx copied to clipboard
[Interpreter: Optimization] Use dataflow ordering for stabilization rather than looping
Just going to write this out to get it out of my head.
#828 added support for re-ordering assignments into a dataflow order. This could enable a (possibly) faster algorithm for the interpreter's combinational time stabilization loop (the inner loop). Currently this process looks like:
For all assignments A
evaluate A under env E
Commit all updates from the assignments (at once) to get E'
Run combinational paths of all cells under E' and commit all updates (at once) to get E''
Repeat until E stops changing
With dataflow ordering we could imagine an alternate algorithm which does not have a loop and looks something like:
Iterate through assignments collecting assignments based on the destination cell. Once an assignment is encountered which is attached to different cell, commit all the updates from the collected assignments to get E'. Evaluate the collected assignment's cell under E' and commit the updates to get E''. Repeat the process for the remaining assignments under E''.
This should only require a single pass over the assignments for the stabilization rather than iterating to convergence which would likely make the interpreter faster. This does require reworking a lot of code and loses the nice two-phase approach the current stabilization algorithm but could be worth it. The code for this version is likely to be a bit more frustrating as mutable and immutable portions overlap which will require some wrangling of the iteration, though I believe it would be manageable.
I am not sure how this particular model handles un-driven values and whether it could introduce any bugs.
This looks good! This is essentially the exact model of interpretation used by the symbolic evaluator and as far as I can tell, hasn't uncovered any problems. The one caveat is that when you simulate a group, you need to also sort the assignments w.r.t to the current set of continuous assignment and then execute all the assignments.
This sorted can be costly so one thing to do there is to memoize the set of reads and writes performed by the continuous assignments and use that to either place all the continuous assignments before a group's assignments or after (I think anything else might not be possible?).
Yeah definitely seems like there's room for some clever caching/memoization. There's potentially some room for deciding to do the sorting during runtime as in some cases it may be faster to just simulate the group inefficiently than pay the sorting cost, though it's hard for me to reason about how common that would be
@EclecticGriffin is there something specific we need to discuss for this pass? If not, can we remove "discussion needed" and make it "available"?
We can move this to available
Closing this for the moment, will probably re-open later after some of the interpreter redesign efforts are further along