[mcore] verl+megatron development tracking
veRL Megatron-core Development Tracking
This page focuses on development of verl+mcore. The milestone target is to enable training deepseek-v3 on veRL as #708 and the further target is to continuously enhance the verl training experience of the mcore backend.
Progress and TODO
Recent
- [x] update mcore version to 0.11 #392
- [x] use mcore
GPTModelapi instead of huggingface workaround with sequence packing #706 - [x] support context parallel #970
- [x] support loading mcore dist_checkpointing #1030
- [x] support Megatron 0.11.0 and vLLM 0.8.2 #851
- [x] support qwen2moe training #1139
- [x] support
Moonlight-16B-A3Btraining (WIP) #1284 - [ ] support
Qwen2.5-VLtraining #1286 - [x] support EP(expert parallel) #1467
Further
- [ ] FP8 training
- [ ] training efficiency related optimization
- [ ] support sglang inference engine
- [ ] support trtllm inference engine
Could you merge the todo list from this as well? https://github.com/volcengine/verl/issues/825
mark👀
mark👀
mark👀
How does the mcore backend support make_vocab_size_divisible_by, and where to pad the vocabulary to meet the splitting requirements?
Could you support qwen to use Megatron for SFT training?
Referencing a related issue: https://github.com/volcengine/verl/issues/708
The SGlang backend feature support sglang inference engine is already there in verl right ?