axon
axon copied to clipboard
Do not include stateful parameters in optimizer state
There is no need to include these because they are not a part of the optimization process, so it ends up being a waste of memory duplicating them and for stateful keys like dropout it leads to memory issues