verl
verl copied to clipboard
missing `'` for `actor_rollout_ref.rollout.n` in examples?
System Info
Today when I tried to run examples/grpo_trainer/run_qwen3moe-30b_megatron_96gb.sh, a strange error occured:
Could not override 'actor_rollout_ref.rollout.n'.
To append to your config use +actor_rollout_ref.rollout.n=16
Key 'n' is not in struct
full_key: actor_rollout_ref.rollout.n
object_type=dict
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
I found it was because that the config in _generated_ppo_megatron_trainer.yaml uses 'n' as the name, while args in run_qwen3moe-30b_megatron_96gb.sh uses n.
After doing the following change in examples/grpo_trainer/run_qwen3moe-30b_megatron_96gb.sh, the error got fixed:
actor_rollout_ref.rollout.n
=>actor_rollout_ref.rollout.'n'
Is this a bug? I also find all scripts in examples miss the '. I can open a PR to fix this if needed.
Information
- [x] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
run examples/grpo_trainer/run_qwen3moe-30b_megatron_96gb.sh
Expected behavior
Above