Gym
Gym copied to clipboard
Pipeclean NeMo RL training with all environments
Use cases, pain points, and background
Description:
Design:
Out of scope:
Acceptance Criteria:
- [ ] All training environments must be trainable easily with an instruct and thinking model
- [ ] Any fixes that we need to do along the way
Please also check the Huggingface datasets themselves on HF Hub. We also want to fix issues like those in the screenshot below https://huggingface.co/datasets/nvidia/Nemotron-RL-math-OpenMathReasoning
I believe the fix for this particular issue is that we need to rename the filename from open_math_reasoning_problems.jsonl to train.jsonl and it will be picked up
For QA, please ensure that all the rows in the HF dataset match what is expected