Lukasz Stafiniak
Lukasz Stafiniak
To avoid confusion with `Tensor.params`.
Inlining often leads to redundant repeated computations, sometimes lots of them, making common subexpression elimination much more valuable than it would be on its own.
This will allow more tensor nodes to become virtual.
The Metal backend forces us to do this, but it's good for us! `buffer_ptr` becomes `buffer_offset`. With this significant refactoring, we can also decide to rename things to maybe something...
At least, remove rendundant wrapping and unwrapping with `Some`. This is aiming at readability of the translations should anyone check them out for educational or debugging purposes.
I'm postponing working on this. There's non-determinism even when using a single stream with multicore_cc, but only in the test/training/bigram.ml example -- and non-determinism in moons_demo_parallel but that's more opportunities...
Add a field to Local_scope to track this and populate it from the `recurrent` field of traced_array IIRC. So this should be easy.
They implement an interpreter on the GPU, maybe we can avoid that yet still use their solutions for within-kernel synchronization. Or maybe we can go the interpreter route, to be...
The MCP doc should underscore the more efficient "goto ; expand" compared to arrow and enter key emulating commands.