[Example] OpenRLHF Integration
This PR adds examples for running DPO, SFT and RM Training using OpenRLHF on SkyPilot. Verified the training runs on GCP
I am not super sure why the PR breaks but after some discussion with AI tools it seems like the cause is some Sphinx logic which is not particularly well suited for Windows and hence recommended I use MyST Markdown syntax to fix it. I'll take a look at fixing this again tomorrow but the tests pass for now with the MyST syntax
Hi @Michaelvll Updated the examples with image_id based docker usage and also tested again
Re the docs/source/examples/training/openrlhf.md file, I am still trying to figure out why does it not work if I simply have ../../generated-examples/openrlhf.md in the file. Will try looking at it more closely tomorrow but just wanted to ask if you have any ideas/experience with this. My debugging so far leads me to believe this is caused by some Windows issue but I might be wrong
@Michaelvll The PR is ready for review
Btw I noticed a test fail which seemed highly unrelated - Add OpenRLHF Example · skypilot-org/skypilot@732a79a and after another commit (which adds some additional details in the docs) all tests pass Add OpenRLHF Example · skypilot-org/skypilot@a879808 so I think there is some flakiness in the tests.