Yuyang Ding

Results 9 issues of Yuyang Ding

### What does this PR do? fix incorrect reward function in fapo scripts ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link...

### What does this PR do? - add support for colocate replicas - add ci test (reward loop models colocate with actor_rollout_ref) ### Checklist Before Starting - [ ] Search...

### What does this PR do? example: `tests/single_controller/test_split_resource_pool.py` ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ]...

### What does this PR do? Safer implementation of split resource pool. relevant design and discussion see https://github.com/volcengine/verl/issues/4261 add more ci test ### Checklist Before Starting - [ ] Search...

### Motivation In certain scenarios, we need to use a resource pool to initialize multiple instances. https://github.com/volcengine/verl/pull/4226 and https://github.com/volcengine/verl/pull/4233 may introduce some issues, `SubRayResourcePool` may be a safer implementation without...

### What does this PR do? > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review....

### What does this PR do? as title ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ]...

Reward Loop has been implemented in the current main branch in `verl/experimental/reward`, and will refactor almost the full reward computation pipeline. This issue provides an explanation of Reward Loop and...

### What does this PR do? Reward computation in reward loop will follow the design below: ``` Reward Computation Logic: - if user-customized reward function is provided: -> directly use...