Utku Evci
Utku Evci
I also tried to run this with alpha=0, and it looks slightly better at the start, but still explodes after 1-2k step.
Updated the markdown cell for the derivative. (1) `zero_grad` is needed since model.paramaters are not the only nodes that accumulate gradients. We have to zero the grad on the entire...
Speed is not a concern?? Let's do sparse layers then :P (already did). Indeed I missed the pow after sync: update the code and add tests.
For some reason `grad = 0` line was removed before last call. Added it back. One option could be passing the parameters to the backward() call (kind of tf style)....
Hi, Andrej just tested the code by running the colab's and verified they are working. Training is a bit slower than before since we enabled tracking during backprop by default....
Thanks, Tyler! Tests are passing in this branch, too.
I played with higher-order grads a bit in my fork. It was this plus some small changes. Created a pr: #8
I agree it would be nice to support this. I've implemented ERK in a hacky way in one of our recent projects. The tricky thing is the layer parameter shapes...
Hi, The first one is the legacy network function and will likely to be removed in the future. You should use the second method to define networks: i.e. tf.keras.Model. Dopamine...
Thanks for creating the issue. I need to look at my experiments/notes to remember what I did. I might have converted the checkpoints to tf2 to enable finetuning. I'll check...