[roadmap] verl Q3 development
Past roadmap dicusssions for reference: https://github.com/volcengine/verl/issues/710 https://github.com/volcengine/verl/issues/22
The most important thing for verl Q3 is to make it a modular foundational library for the community to extend, as a starting point but not the destination.
composable model engines
Finish up https://github.com/volcengine/verl/discussions/1560 such that parallelism strategy is not implemented at the engine level, without exposing details to the worker(role) level. The fsdp/megatron engines are expected to be created and run in a standalone fashion, and be reused across different roles.
- [x] fsdp actor, critic, ref (focus on fsdp2)
- [ ] megatron actor, critic, ref
- [ ] torchtitan integration (call for contribution)
- [ ] switch all recipe/examples from fsdp1 to fsdp2 by default (and remove ill-maintained ones)
Work in progress interface for comments https://github.com/volcengine/verl/pull/1977
rollout workers
- [ ] optimize server mode rollout performance
- [ ] modular rollout workers: VllmRolloutWorker and SGLangRolloutWorker, exposing the same APIs
- [ ] support model with random init weight
- [ ] weight resharding: optimize tp x dp dispatch, and support receiving weight from separate resource groups
- [ ] Agent RL infrastructure https://github.com/volcengine/verl/issues/2618
Additional ongoing efforts:
- https://github.com/zhaochenyang20/Awesome-ML-SYS-Tutorial/issues/131
- https://github.com/volcengine/verl/issues/1882
async & disaggregated architecture
- [x] one-step off async pipeline (WIP: https://github.com/volcengine/verl/pull/2231), further performance optimization & profiling needed
- [ ] streaming/partial rollout (WIP: https://github.com/volcengine/verl/pull/2200)
- [ ] performance tuning, and reference throughput benchmark across [model type, model size, seqlen, hardware, num accelerators, worker role] to achieve better disaggregated resource allocation
- [ ] fully-async pipeline
multi-turn, data, config infra
- [ ] better message infra for multi-turn messages, dense reward @SwordFaith
- [ ] better dataset schema for train & rollout. We need documentation too. TRL's documentation is good https://huggingface.co/docs/trl/en/dataset_formats @SwordFaith
- [ ] use tensordict and nested-tensor to remove padding and replace DataProto
- [ ] replace omegaConfig with read-only dataclass for verl internal config passing https://github.com/volcengine/verl/pull/2379 https://github.com/volcengine/verl/pull/2147/files and make unit test easier
- [ ] P1: distributed data pool from https://arxiv.org/pdf/2507.01663v1 https://github.com/volcengine/verl/issues/2539
streamline new model workflow
- [ ] document the workflow to add a new hf model to verl. Currently with latest vllm there's no need to add weight loader mentioned in https://verl.readthedocs.io/en/latest/advance/fsdp_extension.html
- [ ] better abstraction and registration system for multi-modal models. Currently different multi-modals have inconsistent config attr (e.g. rope), freeze/unfreeze setup, input/output processing... (ideally this should be done at huggingface transformers level but it's not sufficient right now cc @NielsRogge) (RFC needed)
- [ ] verl needs a documentation page about the latest status of model support and per model related features (lora, sequence parallelism, megatron, etc)
high quality recipes and end2end optimizations
- [x] retool recipe (code is ready, going through reviews)
- [ ] SOTA multimodal vlm RL recipe (call for contribution)
- [ ] enhance DAPO recipe with larger models, and provide scripts with high training throughput (many perf knobs are not turned on in the current script)
- we welcome more recipes from the community, please open an RFC if you're interested in contributing before opening any PR for recipes https://github.com/volcengine/verl/issues/2136
Additional existing ongoing features:
- https://github.com/volcengine/verl/issues/1033
- https://github.com/volcengine/verl/discussions/2171
Many roadmap tasks in this doc are initiated by & credit to @vermouth1992 @SwordFaith
Please let me know which task I can start with and will take up those ? Do we have any community meeting and slack or other medium we are using for communication ?
The code is very good. Can you support the latest rollout PP?