pytorch-image-models icon indicating copy to clipboard operation
pytorch-image-models copied to clipboard

[FEATURE] Add DeepSpeed / ZeRO support in training script

Open Tianyi-Franklin-Wang opened this issue 1 week ago • 0 comments

Is your feature request related to a problem? Please describe.
DeepSpeed/ZeRO provides meaningful acceleration and memory reduction for large-scale training. However, timm’s training script currently does not include an official or built-in way to enable DeepSpeed/ZeRO. I noticed there was an earlier attempt in issue #490, but it appears to have been discontinued. I have implemented DeepSpeed/ZeRO support (with a few compromises) in my own training code built on top of the timm training script. If you think this would be a useful addition, I’d be happy to prepare a PR integrating it into the official timm codebase.

Describe the solution you'd like
The current implementation integrates DeepSpeed as an optional dependency and follows timm’s existing training structure. Specifically, it:

  • Introduces a set of new CLI flags that match the current argument parsing style while enabling DeepSpeed and passing configuration options, such as:
    • --deepspeed
    • --ds-zero-stage {0,1,2,3}
    • --ds-offload-optimizer {none,cpu,nvme}
    • --ds-offload-param {none,cpu,nvme}
  • Uses a small helper function, build_ds_config, to construct a DeepSpeed config dict/JSON directly from existing timm arguments (batch size, gradient accumulation, AMP dtype, clipping, etc.).
  • Wraps the model and parameters with deepspeed.initialize only when --deepspeed is enabled, keeping the non-DeepSpeed training path completely unchanged. This ensures full backward compatibility with existing scripts while providing users an opt-in path for ZeRO acceleration.

Additional context
The current handling of model EMA, logging, and checkpointing in the DeepSpeed path is not yet elegant. I can refine these components and better align them with timm’s existing utilities if this feature is accepted. In practice, the acceleration achieved with DeepSpeed/ZeRO has been quite decent in my training experience.

Tianyi-Franklin-Wang avatar Dec 05 '25 07:12 Tianyi-Franklin-Wang