Gym Unsloth Integration

Use cases, pain points, and background Why should we do this? Why is this needed or wanted?

Description: What should we do?

Design: Need to integrate with TRL first

Out of scope: What are some items that this issue could be mistaken to cover that this issue should explicitly NOT cover?

Acceptance Criteria:

[ ] Individual items that need to be finished in order for this issue to be considered completed

Nov 21 '25 17:11 bxyu-nvidia

Unsloth currently does not support custom rollout function in their patched version of TRL GRPOTrainer it seems, making it difficult to fully use NeMo Gym as a rollout tool.

We can take the same approach as OpenEnv and just use NeMo Gym to verify rollouts, not using multiturn or tool use logic, letting unsloth/TRL handle the single turn, no tool rollout. I notice that OpenEnv only acts as a verifier for Unsloth, but has custom rollout example with TRL.

I think we can request or help add support for custom rollout.

The second challenge is whether unsloth allows serving the LLM as openai compatible responses/chat completions. TRL has a vllm server mode (missing openai endpoints ) but I am not yet sure if unsloth supports it.

Dec 02 '25 18:12 cmunley1

Hi @cmunley1 we're going to add support for custom rollouts soon. As for the vllm server mode, we can work on supporting it as well. If it operates just like trl's would that be sufficient?

Dec 04 '25 18:12 mmathew23

Hey @mmathew23 do you have a timeline for custom rollout function?

For vllm server mode, I think that operating like trl is sufficient, but an async vllm engine with openai compatible endpoints would be better for us and probably more efficient. maybe @bxyu-nvidia can comment on this

Dec 04 '25 18:12 cmunley1

We don't have a hard timeline at the moment, but we are currently working on both compatibility with trl 0.25 and transformers v5.

Dec 04 '25 21:12 mmathew23