Amos optimizer support

Open LouisCastricato opened this issue 3 years ago • 3 comments

🚀 The feature, motivation, and pitch

https://arxiv.org/abs/2210.11693

Amos reports better scaling (for multi accelerator) and better performance when compared to AdamW for autoregressive and masked language modeling. We should apply it to trlX and see if it helps speed up RLHF.

Alternatives

We could alternatively just stay with AdamW, which is very tried and tested.

Additional context

We need to seriously consider the wall time constraints of Amos and if it creates any serious optimization bottlenecks for us.

Oct 26 '22 17:10 LouisCastricato

@LouisCastricato Would like to work on this. Will try to work on it over the weekend but might need some guidance.

Oct 28 '22 19:10 VijayKalmath

Let us know if you run into any issues here or on the trlx channel.

Oct 28 '22 19:10 Dahoas

Following up

Dec 18 '22 13:12 LouisCastricato

Closing issue as inactivate.

Jan 05 '23 17:01 LouisCastricato