Consider: Refactor / rewrite gradient management from storing backprop computations to storing backprop tensors
This is high priority as it is big design impact. It removes the "differentiable / non-differentiable tensors" distinction.
This does not get rid of global session state management with implicit differentiated roots. I don't rule out that as a follow-up refactoring.
Postponing this: I need to discuss the options, this does not look like an indisputable win.
Let's suspend this for now as this is a departure from the original OCANNL design, maybe stupid but let's see first... What might be interesting is to add support for forward-mode autodiff within the existing design! I.e. as another field on the diff part of (differentiable) tensors -- a map from parameters to gradients wrt. them.