cmunley1
cmunley1
**Describe the bug** CPU memory usage steadily increasing until OOM. Qwen 235b a22b. OOMs at the end of this chart. Customer-reported, do not have full reproducer yet, but the RL...
**Describe the bug** Large logprob erors with qwen30b a3b with gspo ``` grpo: num_prompts_per_step: 256 num_generations_per_prompt: 16 loss_fn: reference_policy_kl_penalty: 0 ratio_clip_min: 3e-4 ratio_clip_max: 4e-4 ratio_clip_c: null use_on_policy_kl_approximation: false use_importance_sampling_correction: false...
need to set uv pip install python flag in colab environments when launching servers usage: `ng_run "+config_paths=[...]" +uv_pip_set_python=true ` defaults to false For https://github.com/NVIDIA-NeMo/Gym/issues/370 Needed for notebook here: https://docs.unsloth.ai/models/nemotron-3#reinforcement-learning--nemo-gym
Single step tutorial for GRPO with unsloth using nemo gym verifier! Addresses https://github.com/NVIDIA-NeMo/Gym/issues/370
**Use cases, pain points, and background** We should add an example, recipe, or adapter for training an agent defined in NAT with NeMo Gym and NeMo RL. **Description**: Simple example...
function calling resources server based on https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k
adds a section for single-step training with unsloth and trl not sure if these should be broken into separate sections. Left as one since the same notebook works for both,...
### Feature request openai-compatible responses and chat completions endpoints in vllm_serve.py so that applications implementing a custom rollout function can use these endpoints as if they were a openai or...