micrograd
micrograd copied to clipboard
Big speedup
With these changes I could speed up execution a lot. On my machine instead of over 2 minutes it now takes about 14 seconds (including saving the plot as SVG, 12 seconds without that).
The changes are:
Because the expression tree is always exactly the same (when not batching) it can be re-used and instead of re-creating the whole tree every time the new refresh()
method recalculates all the values in-place. This method also sets grad
to 0.0
in every node, so zero_grad()
isn't needed in the optimization loop anymore.
Note that for this improvement to work with batching you need a little bit more preparation. That is, you need to create Xb
and yb
dummy lists of batch size containing Value
objects and use these to setup the calculations before the optimization loop. In the loss()
function you then need make the random index selection and update the value.data
attributes of all the objects in Xb
and yb
before running refresh()
. ~~I didn't include an example for that at the moment.~~ I also added code to the demo that demonstrates batching.
Good idea but it also makes this project less intuitive with non-essential optimizations