returnn Gradient checkpointing experiments

Gradient checkpointing experiments

Open NeoLegends opened this issue 1 year ago • 5 comments

trafficstars

I'm using this PR to run the CI. Albert don't bother reviewing yet.

Jun 10 '24 14:06 NeoLegends

I'm using this PR to run the CI.

It's fine that you put intermediate work into a draft PR. But you don't need to use the CI to test things. You can simply run the relevant tests locally. I usually run the relevant tests inside a debugger (e.g. PyCharm) so I can directly see potential issues. It's actually a nice way to debug and develop, to write the test case and then run it in the debugger.

Jun 11 '24 07:06 albertz

Hey, what's the state here? Did you test this? Does it work? I.e. it doesn't store the intermediate activations in memory?

Jun 24 '24 15:06 albertz

Hey, no new results yet.

Jun 24 '24 15:06 NeoLegends

Hey, no new results yet.

But what was the state? What were the old results? Did it work? Does it not store the intermediate activations in memory anymore? Or you don't know?

Jun 24 '24 16:06 albertz

Also, I don't exactly understand how this API here could be used for variational noise and/or weight dropout. Can you give an example?

Jun 24 '24 16:06 albertz

Superseeded by #1559

Jul 04 '24 12:07 NeoLegends

returnn returnn copied to clipboard

Gradient checkpointing experiments

returnn
returnn copied to clipboard