returnn
returnn copied to clipboard
Gradient checkpointing experiments
I'm using this PR to run the CI. Albert don't bother reviewing yet.
I'm using this PR to run the CI.
It's fine that you put intermediate work into a draft PR. But you don't need to use the CI to test things. You can simply run the relevant tests locally. I usually run the relevant tests inside a debugger (e.g. PyCharm) so I can directly see potential issues. It's actually a nice way to debug and develop, to write the test case and then run it in the debugger.
Hey, what's the state here? Did you test this? Does it work? I.e. it doesn't store the intermediate activations in memory?
Hey, no new results yet.
Hey, no new results yet.
But what was the state? What were the old results? Did it work? Does it not store the intermediate activations in memory anymore? Or you don't know?
Also, I don't exactly understand how this API here could be used for variational noise and/or weight dropout. Can you give an example?
Superseeded by #1559