DeepSpeed [Roadmap] DeepSpeed Roadmap Q1 2025

This is a living document! For each item here, we intend to link the PR/issue for discussion.

This is DeepSpeed's first attempt at a public roadmap and will be updated with additional details.

Long sequence work
Torch.compile
Universal checkpointing
I/O acceleration
Accelerator abstraction
Tensor parallel for training
- [x] #7004
- [ ] #7115

Jan 13 '25 22:01 loadams

Will Multi-Token Prediction mentioned in Deepseek V3 be added to the roadmap Q1?

Feb 11 '25 11:02 zhaoyang-star

need FP8 training deepseek-MOE

Feb 19 '25 01:02 shiyongde

Plug-in support for the different accelerators

Mar 12 '25 10:03 hijeffwu

@hijeffwu - could you clarify more on what you're requesting? Different accelerators are already supported in DeepSpeed.

Mar 12 '25 15:03 loadams

@hijeffwu - could you clarify more on what you're requesting? Different accelerators are already supported in DeepSpeed.

My idea is as follows:

The current process for adding support for a new accelerator card involves creating a new xxx_accelerator.py file in the accelerators directory and adding a product-specific directory under DeepSpeed/op_builder to adapt kernels for different chips. However, this architecture lacks a unified backend for the different chip kernel code.

Since the primary difference in AI chip vendors' support for DeepSpeed lies in kernel implementations, would it be possible to use "deepspeed-kernels" as the unified kernel code backend for DeepSpeed, while retaining only Python code in the main DeepSpeed repository? This approach could be like Megatron-LM + Apex + TransformerEngine, thereby making DeepSpeed more adaptable to diverse AI chip backends.

Key points in this proposal:

Vendor Flexibility: Chip manufacturers could contribute optimized kernels to deepspeed-kernels without modifying core DeepSpeed code.
Maintainability: Simplifies codebase management by isolating low-level optimizations.
Cross-Platform Compatibility: Similar to how TransformerEngine abstracts NVIDIA-specific optimizations.

This architecture aligns with observed practices in adapting DeepSpeed to non-NVIDIA hardware .

Mar 13 '25 03:03 hijeffwu

What specific plans do you have for long sequences?

Apr 03 '25 01:04 huhuiqi7

Closing this roadmap as it was fairly out of date. We will reopen a new roadmap for Q1 2026.

cc: @tjruwase @sfc-gh-truwase @PKUWZP

Nov 14 '25 02:11 loadams