Enable (and ensure proper) memory management for arrays embedded in temporary or only-non-top-level tensors
Fortunately, by their nature such arrays are not needed on the host, so we just need to make sure they are correctly initialized on the devices (i.e. the from_host direction).
Prominently, these are temporary tensors in optimized updates (currently, Train.sgd_one).
To clarify the wording: the tensors are temporary, but the arrays persist across the optimizer steps. This makes sense in OCANNL -- in PyTorch and and similar frameworks the arrays are the tensors. It's better to avoid using "temporary tensor" in this sense more broadly as it will be confusing.
I no longer remember what the issue is about... Maybe memory modes make the situation clearer.
There's a related issue that I will fix tomorrow: at link time, require that tensor nodes in the graph of a tensor are either embedded, or already part of the parent context (the context passed to link that will be the parent of the new context).