[Bug] adam_pax has reuse donated buffer warning
Hi, I noticed that when using adam_pax instead of adamw as optimizer, it will give reuse donated buffer warning. I am wondering if this is expected, and why the code uses adam_pax instead of the standard optax.adam as it does for adamw.
Thank you very for your help! @rwitten
I'd recommend not using this optimizer -- it is only for MLPerf.
@ZhiyuLi-goog -- can you look at the (quite scary) warning?
Thank you @LeoXinhaoLee for heads up.
The warning message looks new to me. I'm not sure how to reproduce this warning message. Would you have any sample code for reproduction? This would help me troubleshoot the issue.
Hi, I remember running a llama-7b model with Adam Optimizer will give this warning at the beginning. I think for other models this warning could still occur, seems like an orthogonal problem to model.
My device is a v3-512 pod. Meanwhile, would you mind helping me with another issue posted by me regarding data loading pipeline? Thank you so much!
Hi @LeoXinhaoLee
Many thanks for finding the bug. Have filed a fix. Feel free to let me know if you have new questions.
The fix was merged in and marked it as fixed.
Closed it with a fix.