DeepSpeed
DeepSpeed copied to clipboard
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Regarding this issue: https://github.com/pytorch/pytorch/issues/97079 There are some comm ops in deepspeed, which for the moment aren't traceable by dynamo, and probably the best medium term solution is to make them...
CompiledModuleWrapper is implemented as a wrapper class for the model. I see a few issues when running unit tests with compile enabled. 1. isinstance(self.module, PipelineModule) used in multiple places in...
**Describe the bug** * Enable BF16 training * Set gradient accumulation types to FP32 * Enable ZeRO-1, and CPU offload * Enable overlap_comm * Tune train batch size so that...
If there are N GPUs, the snapshot will be N files for optimizer states. Each file corresponds to 1 GPU. (let me know if the understanding is not correct). Then,...
Is deepspeed compatible with AMD CPU ? When i import DeepSpeedCPUAdam optimizer on AMD CPU, I got following warning: [WARNING] [cpu_adam.py:84:__init__] FP16 params for CPUAdam may not work on AMD...
**Describe the bug** I reviewed the initialization of self.gradient_accumulation_steps in the DeepSpeedConfig module when only train_batch and micro_batch are set (deepspeed Version: 0.13.1): ```python grad_acc = train_batch // micro_batch grad_acc...
**Describe the bug** Hi, i'm trying to run pretraining gpt model with [Megatron-DeepSpeed](https://github.com/microsoft/Megatron-DeepSpeed?ysclid=ls8rr5jnv3799144357) pipeline and Zero-3 + Mics sharding strategy, but got next log: ``` WARNING: Runtime Error while waiting...
Hello, I would like to ask for assistance in solving a problem I've encountered. I am currently training a MLLM with DeepSpeed, and I've introduced an additional modality to the...
PR#5104 (Remove optimizer step on initialization) breaks loading universal checkpoint for BF16_Optimizer. This is since universal checkpoint attempts to load the optimizer states into lp._hp_mapping.optim_state dictionary before they are initialized...
Latest checkout uses latest (non-deprecated) version of node (16 -> 20). More information [here](https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/): ``` Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3....