Consider: Refactor / rewrite gradient management from storing backprop computations to storing backprop tensors

Open lukstafi opened this issue 11 months ago • 1 comments

This is high priority as it is big design impact. It removes the "differentiable / non-differentiable tensors" distinction.

This does not get rid of global session state management with implicit differentiated roots. I don't rule out that as a follow-up refactoring.

Feb 01 '25 12:02 lukstafi

Postponing this: I need to discuss the options, this does not look like an indisputable win.

Mar 21 '25 11:03 lukstafi

Let's suspend this for now as this is a departure from the original OCANNL design, maybe stupid but let's see first... What might be interesting is to add support for forward-mode autodiff within the existing design! I.e. as another field on the diff part of (differentiable) tensors -- a map from parameters to gradients wrt. them.

Jul 16 '25 15:07 lukstafi