Lukasz Stafiniak
Lukasz Stafiniak
Also, inside `arrayjit`, split off the high-level representation into its own small module. Original plan was to split off a library `nndarray` first, containing `Ndarray` and `Node`, with global state...
Fortunately, by their nature such arrays are not needed on the host, so we just need to make sure they are correctly initialized on the devices (i.e. the `from_host` direction)....
Since it's meant for both trainable parameters, and non-trainable in-place-updatable inputs.
This assumption simplifies optimizations. We already make it in some cases of inferring memory modes.
Still need to designate one of the devices as hosting the ndarray. A device can refuse to host. The interface is simply the device returning an ndarray -- only uniform-memory...
That's what the doc says, but it works with regular OCaml strings.
[polars-ocaml is a project to provide idiomatic OCaml bindings to the Polars dataframe library.](https://github.com/mt-caret/polars-ocaml)
The size of environments grows too big -- too many row variables. Maybe also consider hashconsing for computing substitutions.
Implement as many optimizations as reasonable from these posts: * [Fast Multidimensional Matrix Multiplication on CPU from Scratch](https://siboehm.com/articles/22/Fast-MMM-on-CPU) * [How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a...
The `%op` extension keeps track of the to-be-tensor's label via `?ident_label`. We would need to modify: ```ocaml | [%expr fun [%p? pat] -> [%e? body]] -> let vbs, body =...