Utku Evci comments

Results 12 comments of


                                            Utku Evci

implement gsam in jax

I also tried to run this with alpha=0, and it looks slightly better at the start, but still explodes after 1-2k step.

Adding higher order gradients

Updated the markdown cell for the derivative. (1) `zero_grad` is needed since model.paramaters are not the only nodes that accumulate gradients. We have to zero the grad on the entire...

Adding higher order gradients

Speed is not a concern?? Let's do sparse layers then :P (already did). Indeed I missed the pow after sync: update the code and add tests.

Adding higher order gradients

For some reason `grad = 0` line was removed before last call. Added it back. One option could be passing the parameters to the backward() call (kind of tf style)....

Adding higher order gradients

Hi, Andrej just tested the code by running the colab's and verified they are working. Training is a bit slower than before since we enabled tracking during backprop by default....

Adding higher order gradients

Thanks, Tyler! Tests are passing in this branch, too.

Grad should be a Value instead of python/numpy scalar

I played with higher-order grads a bit in my fork. It was this plus some small changes. Created a pr: #8

Non-uniform sparsity layer-wise with PolynomialDecay scheduler

I agree it would be nice to support this. I've implemented ERK in a hacky way in one of our recent projects. The tricky thing is the layer parameter shapes...

atari_lib.py defining the neural network

Hi, The first one is the legacy network function and will likely to be removed in the future. You should use the second method to define networks: i.e. tf.keras.Model. Dopamine...

Bug: --config.learning.finetune_backbones=True

Thanks for creating the issue. I need to look at my experiments/notes to remember what I did. I might have converted the checkpoints to tf2 to enable finetuning. I'll check...