Yuyang Ding issues

Results 9 issues of


                                            Yuyang Ding

[recipe] fix: incorrect reward function in fapo scripts

### What does this PR do? fix incorrect reward function in fapo scripts ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link...

[worker] feat: add support for colocate replicas

### What does this PR do? - add support for colocate replicas - add ci test (reward loop models colocate with actor_rollout_ref) ### Checklist Before Starting - [ ] Search...

[single_controller] feat: support resource pool split method

### What does this PR do? example: `tests/single_controller/test_split_resource_pool.py` ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ]...

[single_controller] feat: support resource_pool split

### What does this PR do? Safer implementation of split resource pool. relevant design and discussion see https://github.com/volcengine/verl/issues/4261 add more ci test ### Checklist Before Starting - [ ] Search...

[RFC] split resource pool

### Motivation In certain scenarios, we need to use a resource pool to initialize multiple instances. https://github.com/volcengine/verl/pull/4226 and https://github.com/volcengine/verl/pull/4233 may introduce some issues, `SubRayResourcePool` may be a safer implementation without...

[model] feat: support discriminative reward model in reward loop

### What does this PR do? > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review....

[worker] fix: do not pass router address and tokenizer is their value is none

### What does this PR do? as title ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ]...

[RFC] Reward Loop

Reward Loop has been implemented in the current main branch in `verl/experimental/reward`, and will refactor almost the full reward computation pipeline. This issue provides an explanation of Reward Loop and...

[rollout] feat: add support for discriminative reward model in reward loop

### What does this PR do? Reward computation in reward loop will follow the design below: ``` Reward Computation Logic: - if user-customized reward function is provided: -> directly use...