Unsloth Integration
Use cases, pain points, and background Why should we do this? Why is this needed or wanted?
Description: What should we do?
Design: Need to integrate with TRL first
Out of scope: What are some items that this issue could be mistaken to cover that this issue should explicitly NOT cover?
Acceptance Criteria:
- [ ] Individual items that need to be finished in order for this issue to be considered completed
Unsloth currently does not support custom rollout function in their patched version of TRL GRPOTrainer it seems, making it difficult to fully use NeMo Gym as a rollout tool.
We can take the same approach as OpenEnv and just use NeMo Gym to verify rollouts, not using multiturn or tool use logic, letting unsloth/TRL handle the single turn, no tool rollout. I notice that OpenEnv only acts as a verifier for Unsloth, but has custom rollout example with TRL.
I think we can request or help add support for custom rollout.
The second challenge is whether unsloth allows serving the LLM as openai compatible responses/chat completions. TRL has a vllm server mode (missing openai endpoints ) but I am not yet sure if unsloth supports it.
Hi @cmunley1 we're going to add support for custom rollouts soon. As for the vllm server mode, we can work on supporting it as well. If it operates just like trl's would that be sufficient?
Hey @mmathew23 do you have a timeline for custom rollout function?
For vllm server mode, I think that operating like trl is sufficient, but an async vllm engine with openai compatible endpoints would be better for us and probably more efficient. maybe @bxyu-nvidia can comment on this
We don't have a hard timeline at the moment, but we are currently working on both compatibility with trl 0.25 and transformers v5.