DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

[Roadmap] DeepSpeed Roadmap Q1 2025

Open loadams opened this issue 11 months ago • 6 comments

This is a living document! For each item here, we intend to link the PR/issue for discussion.

This is DeepSpeed's first attempt at a public roadmap and will be updated with additional details.

  • Long sequence work
  • Torch.compile
  • Universal checkpointing
  • I/O acceleration
  • Accelerator abstraction
  • Tensor parallel for training
    • [x] #7004
    • [ ] #7115

loadams avatar Jan 13 '25 22:01 loadams

Will Multi-Token Prediction mentioned in Deepseek V3 be added to the roadmap Q1?

zhaoyang-star avatar Feb 11 '25 11:02 zhaoyang-star

need FP8 training deepseek-MOE

shiyongde avatar Feb 19 '25 01:02 shiyongde

Plug-in support for the different accelerators

hijeffwu avatar Mar 12 '25 10:03 hijeffwu

@hijeffwu - could you clarify more on what you're requesting? Different accelerators are already supported in DeepSpeed.

loadams avatar Mar 12 '25 15:03 loadams

@hijeffwu - could you clarify more on what you're requesting? Different accelerators are already supported in DeepSpeed.

My idea is as follows:

The current process for adding support for a new accelerator card involves creating a new xxx_accelerator.py file in the accelerators directory and adding a product-specific directory under DeepSpeed/op_builder to adapt kernels for different chips. However, this architecture lacks a unified backend for the different chip kernel code.

Since the primary difference in AI chip vendors' support for DeepSpeed lies in kernel implementations, would it be possible to use "deepspeed-kernels" as the unified kernel code backend for DeepSpeed, while retaining only Python code in the main DeepSpeed repository? This approach could be like Megatron-LM + Apex + TransformerEngine, thereby making DeepSpeed more adaptable to diverse AI chip backends.

Key points in this proposal:

  1. Vendor Flexibility: Chip manufacturers could contribute optimized kernels to deepspeed-kernels without modifying core DeepSpeed code.
  2. Maintainability: Simplifies codebase management by isolating low-level optimizations.
  3. Cross-Platform Compatibility: Similar to how TransformerEngine abstracts NVIDIA-specific optimizations.

This architecture aligns with observed practices in adapting DeepSpeed to non-NVIDIA hardware .

hijeffwu avatar Mar 13 '25 03:03 hijeffwu

What specific plans do you have for long sequences?

huhuiqi7 avatar Apr 03 '25 01:04 huhuiqi7

Closing this roadmap as it was fairly out of date. We will reopen a new roadmap for Q1 2026.

cc: @tjruwase @sfc-gh-truwase @PKUWZP

loadams avatar Nov 14 '25 02:11 loadams