Chris Wing
Chris Wing
**Describe the bug** The ray.sub script currently requires users to submit SLURM jobs from the NeMo RL repository directory, which prevents users from organizing their training scripts and experiments in...
## Document Multi-Step Patterns ### Context Users have asked how to handle multi-step agentic tasks where the agent makes multiple tool calls within a single trajectory. This requires understanding: -...
### Background Users want to train models on multi-turn conversational tasks where the agent handles back-and-forth interactions with a user ### Problem Users need guidance on: - What "multi-turn" means...
### Background Often during multi-turn conversational training, users need to simulate realistic user responses during rollout collection. ### Problem Users need guidance on: - When to use LLM-based user simulation...
## Tutorial: How to Incorporate LLM as a Judge in Verification Logic ### Background Users have asked how to use LLM-as-a-judge for verification in their resource servers. This is particularly...
## Add Architecture Diagram and Clarify NeMo Gym + NeMo RL Integration ### Background Users were unclear on the data flow between NeMo Gym, NeMo RL, and the policy model....
**Use cases, pain points, and background** While they differ in their architectural approach, both frameworks target the same goal: enabling Reinforcement Learning from Verifiable Reward (RLVR) at scale. **Description**: Goal:...
**Describe the bug** A user noted confusion because of the column `environment_name` has value `workbench` but the resource server in NeMo Gym is named `workplace_assistant`. The user was also unclear...
**Describe the bug** Workplace assistant `dataset_preprocess.py` has an incorrect dataset reference in `get_samples` function **Steps/Code to reproduce bug** This file doesn't exist https://github.com/NVIDIA-NeMo/Gym/blob/287d08d4ccba3c146e44616ffa1cf3b9ddaab92b/resources_servers/workplace_assistant/dataset_preprocess.py#L87 dataset = load_dataset("Nexusflow/250319-workplace_assistant-fulleval", split=split) **Expected behavior** Use...
Badges should be added to the top of the README - License - CI/CD - python version - release version