agd
agd copied to clipboard
How to support other layers?
Hey @jxbz , currently I am trying this optimizer with some other networks. Why don't support those layers with dimension 1 (biases)? Any other suggestions on getting it to work on the transformer or other networks? Thank you!