pytorch-lightning-snippets icon indicating copy to clipboard operation
pytorch-lightning-snippets copied to clipboard

Some thoughts + questions

Open TylerYep opened this issue 4 years ago • 2 comments

Hey, thank you so much for writing this implementation up! It is a feature I've wanted to see in pytorch-lightning for a long time but never got the chance to get to it.

In the BatchGradient verifier, we pop the index containing the batch that we are testing for. However, I think it would be preferable to also verify that the gradient of that popped batch is in fact non-zero, since a gradient of all zeros would pass our test but would not train the network at all. My example code is here: verify.py

Finally, in my own projects, I wrote a few other verification functions, but I believe that they are already handled by lightning, could you verify this? I am referring to:

  • Issuing a warning if train() is turned on but all layers are frozen
  • Any NaN or INF value is present in gradients or any weights

If I have some spare time I will try testing this code myself, but overall looks really great! 👍

TylerYep avatar Aug 25 '20 00:08 TylerYep

@TylerYep Thank you very much for the feedback. Saw this message only just now.. sorry!

However, I think it would be preferable to also verify that the gradient of that popped batch is in fact non-zero, since a gradient of all zeros would pass our test but would not train the network at all.

Very good observation, I will include that! EDIT: done here: https://github.com/awaelchli/pytorch-lightning-snippets/commit/9527fbacf1ed4e10748d8fa066316450076015a0

Issuing a warning if train() is turned on but all layers are frozen

I am not aware of such a feature in Lightning :)

Any NaN or INF value is present in gradients or any weights

Yes, but one needs to turn it on with a Trainer flag. Searching for these values every iteration can impact performance, so it is not on by default.

awaelchli avatar Sep 07 '20 06:09 awaelchli

I also see in your verify.py, you have a function that runs all tests at once, that seems very convenient. 👍

awaelchli avatar Sep 07 '20 06:09 awaelchli