cmunley1 comments

Results 17 comments of


                                            cmunley1

Salesforce xlam-function-calling-60k resources server

above is actually reward hacking by calling more and more tools, changing reward structure to exact match.

Salesforce xlam-function-calling-60k resources server

Explicitly don't support Responses API instructions

I think we could just treat as system message

Unsloth Integration

Unsloth currently [does not support custom rollout function](https://github.com/unslothai/unsloth/issues/3573) in their patched version of TRL GRPOTrainer it seems, making it difficult to fully use NeMo Gym as a rollout tool. We...

Unsloth Integration

Hey @mmathew23 do you have a timeline for custom rollout function? For vllm server mode, I think that operating like trl is sufficient, but an async vllm engine with openai...

feat: TRL Integration

TRL has a [custom rollout function](https://github.com/huggingface/trl/blob/main/trl/trainer/grpo_trainer.py#L214) and [vllm server mode](https://github.com/huggingface/trl/blob/main/trl/scripts/vllm_serve.py) that makes the integration easier. The vllm server is not a typical AsyncLLMEngine, it does not have openai chat completions/responses...

Hot reload enabled for native servers

I took a stab at this [here](https://github.com/NVIDIA-NeMo/Gym/compare/main...cmunley1/reload ) It seems to work but not tested extensively