trlx
trlx copied to clipboard
What deepspeed config was this tested on?
📚 The doc issue
There is no information on what config or machines this was tested on, nor what the results actually were. I was unable to get my configuration to work for the example code, but I might be using an untested deepspeed configuration (e.g., stage 3 offloading). I'd like to test with the validated configuration.
Suggest a potential alternative/fix
Could you add the tested configurations and machines? Thanks!
@reciprocated and @Dahoas have the deepspeed configuration. Will have it merged. We usually use stage 2 though.
https://github.com/CarperAI/trlx/issues/34#issuecomment-1286363960 This accelerate config works. Typically we don't use custom deepspeed configurations unless multinode is required. If you want more hands on debugging from our engineers, happy to connect you to MLOPs/REs in Carper. Feel free to join the discord and debug me, we actively try to support all academic use cases of the API :)
Resolved?
Can the discord link be posted here?
https://discord.gg/canadagoose
Thanks, joined.