DeepSpeed
DeepSpeed copied to clipboard
AttributeError: 'FP16_DeepSpeedZeroOptimizer' object has no attribute 'ipg_index'
Hi, I want use DeepSpeed to speed my transformer , and I came across such problem:
File "main.py", line 460, in <module>
main(args)
File "main.py", line 392, in main
train_stats = train_one_epoch(
File "/opt/ml/code/deepspeed/engine.py", line 57, in train_one_epoch
loss_scaler(loss, optimizer, clip_grad=clip_grad, clip_mode=clip_mode,
File "/usr/local/lib/python3.8/dist-packages/timm/utils/cuda.py", line 43, in __call__
self._scaler.scale(loss).backward(create_graph=create_graph)
File "/usr/local/lib/python3.8/dist-packages/torch/tensor.py", line 245, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/usr/local/lib/python3.8/dist-packages/torch/autograd/__init__.py", line 145, in backward
Variable._execution_engine.run_backward(
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/zero/stage2.py", line 661, in reduce_partition_and_remove_grads
self.reduce_ready_partitions_and_remove_grads(param, i)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/zero/stage2.py", line 1104, in reduce_ready_partitions_and_remove_grads
self.reduce_independent_p_g_buckets_and_remove_grads(param, i)
File "/usr/local/lib/python3.8/dist-packages/deepspeed/runtime/zero/stage2.py", line 724, in reduce_independent_p_g_buckets_and_remove_grads
new_grad_tensor = self.ipg_buffer[self.ipg_index].narrow(
AttributeError: 'FP16_DeepSpeedZeroOptimizer' object has no attribute 'ipg_index'
My config.json is as follows:
{
"gradient_accumulation_steps": 1,
"train_micro_batch_size_per_gpu":1,
"steps_per_print": 100,
"optimizer": {
"type": "Adam",
"params": {
"lr": 0.00001,
"weight_decay": 1e-2
}
},
"flops_profiler": {
"enabled": false,
"profile_step": 100,
"module_depth": -1,
"top_modules": 3,
"detailed": true
},
"fp16": {
"enabled": true,
"loss_scale": 0,
"loss_scale_window": 1000,
"initial_scale_power": 18,
"hysteresis": 2,
"min_loss_scale": 1
},
"zero_optimization": {
"stage": 1,
"cpu_offload": false,
"contiguous_gradients": true,
"overlap_comm": true,
"reduce_scatter": true,
"reduce_bucket_size":1e8,
"allgather_bucket_size": 5e8
},
"activation_checkpointing": {
"partition_activations": false,
"contiguous_memory_optimization": false,
"cpu_checkpointing": false
},
"gradient_clipping": 1.0,
"wall_clock_breakdown": false,
"zero_allow_untested_optimizer": true
}
Hi @TianhaoFu can you share ds_report
with me? I am curious on what deepspeed version or commit hash you were on. I am trying to reproduce your issue.
Also if this issue is quick to reproduce can you also try with "stage": 2
?
config:
{
"zero_optimization": {
"stage": 1,
"overlap_comm": true
},
"fp16": {
"enabled": true,
"loss_scale": 0,
"initial_scale_power": 32,
"loss_scale_window": 1000,
"hysteresis": 2,
"min_loss_scale": 1
},
"train_batch_size": 8,
"steps_per_print": 4000,
"optimizer": {
"type": "Adam",
"params": {
"lr": 0.001,
"adam_w_mode": true,
"betas": [
0.8,
0.999
],
"eps": 1e-8,
"weight_decay": 3e-7
}
},
"scheduler": {
"type": "WarmupDecayLR",
"params": {
"warmup_min_lr": 0,
"warmup_max_lr": 0.001,
"warmup_num_steps": 10000,
"total_num_steps": 100000
}
},
"wall_clock_breakdown": false
}
get error:
File "/opt/conda/lib/python3.8/site-packages/torch/_tensor.py", line 251, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/opt/conda/lib/python3.8/site-packages/torch/autograd/__init__.py", line 146, in backward
Variable._execution_engine.run_backward(
File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/zero/stage2.py", line 664, in reduce_partition_and_remove_grads
self.reduce_ready_partitions_and_remove_grads(param, i)
File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/zero/stage2.py", line 1109, in reduce_ready_partitions_and_remove_grads
self.reduce_independent_p_g_buckets_and_remove_grads(param, i)
File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/zero/stage2.py", line 726, in reduce_independent_p_g_buckets_and_remove_grads
new_grad_tensor = self.ipg_buffer[self.ipg_index].narrow(
AttributeError: 'FP16_DeepSpeedZeroOptimizer' object has no attribute 'ipg_index'
Env:
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
sparse_attn ............ [YES] ...... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
[WARNING] async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [NO] ....... [NO]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/opt/conda/lib/python3.6/site-packages/torch']
torch version .................... 1.8.0a0+17f8c32
torch cuda version ............... 11.1
nvcc version ..................... 11.1
deepspeed install path ........... ['/opt/conda/lib/python3.6/site-packages/deepspeed']
deepspeed info ................... 0.4.4+6ba9628, 6ba9628, master
deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1
Hi @chrjxj, can you try setting "stage": 1,
in your config json to "stage": 2,
? I want to confirm if your issue occurs with both stages of zero. I am unable to reproduce the error on my side yet.
Actually @chrjxj can you set these both to false in your config? I suspect this will fix your issues.
"contiguous_gradients": false,
"overlap_comm": false,
@jeffra thanks. it still doesn't work and throw out new error msg.
@chrjxj, can you provide the stack trace for the new error message?
Hi @chrjxj, did you find a solution?
@antoiloui, are you also seeing this error? Can you share the deepspeed version you are using and the stack trace? Did you also try turning off contiguous_gradients
and overlap_comm
?
Hi @jeffra, yes I'm experiencing the same issue. Here is the error I get:
File "/root/envs/star/lib/python3.8/site-packages/grad_cache/grad_cache.py", line 242, in forward_backward
surrogate.backward()
File "/root/envs/star/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/root/envs/star/lib/python3.8/site-packages/torch/autograd/__init__.py", line 154, in backward
Variable._execution_engine.run_backward(
File "/root/envs/star/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 769, in reduce_partition_and_remove_grads
self.reduce_ready_partitions_and_remove_grads(param, i)
File "/root/envs/star/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1250, in reduce_ready_partitions_and_remove_grads
self.reduce_independent_p_g_buckets_and_remove_grads(param, i)
File "/root/envs/star/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 826, in reduce_independent_p_g_buckets_and_remove_grads
new_grad_tensor = self.ipg_buffer[self.ipg_index].narrow(
AttributeError: 'DeepSpeedZeroOptimizer' object has no attribute 'ipg_index'
And here is my config file:
{
"zero_optimization": {
"stage": 2,
"offload_optimizer": {
"device": "cpu",
"pin_memory": true
},
"allgather_partitions": true,
"allgather_bucket_size": 2e8,
"reduce_scatter": true,
"reduce_bucket_size": 2e8,
"overlap_comm": false,
"contiguous_gradients": false
},
"steps_per_print": 2000,
"wall_clock_breakdown": false
}
Gotcha, I see. Thank you @antoiloui. What version of deepspeed are you running?
Is it possible to provide a repro for this error that you're seeing?
Hi @chrjxj, did you find a solution?
no... switched to other tasks...
Isn't this problem solved? I'm currently facing a similar error. I'm using FusedAdam as an optimizer so I'm not using the FP16 option, but it's similar.
Here is the error I get:
Traceback (most recent call last):
File "/root/QuickDraw/train.py", line 244, in <module>
train(opt)
File "/root/QuickDraw/train.py", line 165, in train
torch.autograd.backward(loss)
File "/project/lib/python3.9/site-packages/torch/autograd/__init__.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/project/lib/python3.9/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 857, in reduce_partition_and_remove_grads
self.reduce_ready_partitions_and_remove_grads(param, i)
File "/project/lib/python3.9/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1349, in reduce_ready_partitions_and_remove_grads
self.reduce_independent_p_g_buckets_and_remove_grads(param, i)
File "/project/lib/python3.9/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 902, in reduce_independent_p_g_buckets_and_remove_grads
new_grad_tensor = self.ipg_buffer[self.ipg_index].narrow(
AttributeError: 'DeepSpeedZeroOptimizer' object has no attribute 'ipg_index'
this is my deepspeed_config file:
{
"train_batch_size": 32,
"train_micro_batch_size_per_gpu": 8,
"gradient_accumulation_steps": 4,
"zero_optimization": {
"stage": 2,
"offload_optimizer": {
"device": "cpu"
},
"offload_param": {
"device": "cpu",
"pin_memory": true
},
"overlap_comm": true,
"contiguous_gradients": true
},
"steps_per_print": 1,
"optimizer": {
"type": "Adam",
"params": {
"lr": 0.001
}
}
}
Isn't this problem solved? I'm currently facing a similar error. I'm using FusedAdam as an optimizer so I'm not using the FP16 option, but it's similar.
Here is the error I get:
Traceback (most recent call last): File "/root/QuickDraw/train.py", line 244, in <module> train(opt) File "/root/QuickDraw/train.py", line 165, in train torch.autograd.backward(loss) File "/project/lib/python3.9/site-packages/torch/autograd/__init__.py", line 197, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "/project/lib/python3.9/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 857, in reduce_partition_and_remove_grads self.reduce_ready_partitions_and_remove_grads(param, i) File "/project/lib/python3.9/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1349, in reduce_ready_partitions_and_remove_grads self.reduce_independent_p_g_buckets_and_remove_grads(param, i) File "/project/lib/python3.9/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 902, in reduce_independent_p_g_buckets_and_remove_grads new_grad_tensor = self.ipg_buffer[self.ipg_index].narrow( AttributeError: 'DeepSpeedZeroOptimizer' object has no attribute 'ipg_index'
this is my deepspeed_config file:
{ "train_batch_size": 32, "train_micro_batch_size_per_gpu": 8, "gradient_accumulation_steps": 4, "zero_optimization": { "stage": 2, "offload_optimizer": { "device": "cpu" }, "offload_param": { "device": "cpu", "pin_memory": true }, "overlap_comm": true, "contiguous_gradients": true }, "steps_per_print": 1, "optimizer": { "type": "Adam", "params": { "lr": 0.001 } } }
"stage": 2 > "stage":1 Solved
Well, let me join this thread too.. Have the same issue as described above
The code I run can be found here: https://github.com/BlinkDL/RWKV-LM/blob/main/RWKV-v4/train.py
Configuration I use
{
"zero_allow_untested_optimizer": True,
"zero_optimization": {
"stage": 2,
"contiguous_gradients": True,
"overlap_comm": True,
"allgather_partitions": True,
"reduce_scatter": True,
"allgather_bucket_size": 200000000,
"reduce_bucket_size": 200000000,
"sub_group_size": 1000000000000,
},
"activation_checkpointing": {
"partition_activations": False,
"cpu_checkpointing": False,
"contiguous_memory_optimization": False,
"synchronize_checkpoint_boundary": False,
},
"aio": {
"block_size": 1048576,
"queue_depth": 8,
"single_submit": False,
"overlap_events": True,
"thread_count": 1,
},
"gradient_clipping": 1.0,
"gradient_accumulation_steps": 1,
"bf16": {"enabled": True},
}
Traceback:
Traceback (most recent call last):
File "train.py", line 367, in <module>
trainer.run(m_cfg, train_dataset, None, tconf)
File "/home/vscode/.local/lib/python3.8/site-packages/lightning_lite/lite.py", line 433, in _run_impl
return self._strategy.launcher.launch(run_method, *args, **kwargs)
File "/home/vscode/.local/lib/python3.8/site-packages/lightning_lite/strategies/launchers/subprocess_script.py", line 93, in launch
return function(*args, **kwargs)
File "/home/vscode/.local/lib/python3.8/site-packages/lightning_lite/lite.py", line 443, in _run_with_setup
return run_method(*args, **kwargs)
File "/home/alexkay28/RWKV-LM/RWKV-v4/src/trainer.py", line 177, in run
run_epoch('train')
File "/home/alexkay28/RWKV-LM/RWKV-v4/src/trainer.py", line 129, in run_epoch
self.backward(loss)
File "/home/vscode/.local/lib/python3.8/site-packages/lightning_lite/lite.py", line 260, in backward
self._precision.backward(tensor, module, *args, **kwargs)
File "/home/vscode/.local/lib/python3.8/site-packages/lightning_lite/plugins/precision/precision.py", line 68, in backward
tensor.backward(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/_tensor.py", line 482, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.8/dist-packages/torch/autograd/__init__.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/home/vscode/.local/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 804, in reduce_partition_and_remove_grads
self.reduce_ready_partitions_and_remove_grads(param, i)
File "/home/vscode/.local/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1252, in reduce_ready_partitions_and_remove_grads
self.reduce_independent_p_g_buckets_and_remove_grads(param, i)
File "/home/vscode/.local/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 847, in reduce_independent_p_g_buckets_and_remove_grads
new_grad_tensor = self.ipg_buffer[self.ipg_index].narrow(0, self.elements_in_ipg_bucket, param.numel())
AttributeError: 'DeepSpeedZeroOptimizer' object has no attribute 'ipg_index'
Well, let me join this thread too.. Have the same issue as described above
The code I run can be found here: https://github.com/BlinkDL/RWKV-LM/blob/main/RWKV-v4/train.py
Configuration I use
{ "zero_allow_untested_optimizer": True, "zero_optimization": { "stage": 2, "contiguous_gradients": True, "overlap_comm": True, "allgather_partitions": True, "reduce_scatter": True, "allgather_bucket_size": 200000000, "reduce_bucket_size": 200000000, "sub_group_size": 1000000000000, }, "activation_checkpointing": { "partition_activations": False, "cpu_checkpointing": False, "contiguous_memory_optimization": False, "synchronize_checkpoint_boundary": False, }, "aio": { "block_size": 1048576, "queue_depth": 8, "single_submit": False, "overlap_events": True, "thread_count": 1, }, "gradient_clipping": 1.0, "gradient_accumulation_steps": 1, "bf16": {"enabled": True}, }
Traceback:
Traceback (most recent call last): File "train.py", line 367, in <module> trainer.run(m_cfg, train_dataset, None, tconf) File "/home/vscode/.local/lib/python3.8/site-packages/lightning_lite/lite.py", line 433, in _run_impl return self._strategy.launcher.launch(run_method, *args, **kwargs) File "/home/vscode/.local/lib/python3.8/site-packages/lightning_lite/strategies/launchers/subprocess_script.py", line 93, in launch return function(*args, **kwargs) File "/home/vscode/.local/lib/python3.8/site-packages/lightning_lite/lite.py", line 443, in _run_with_setup return run_method(*args, **kwargs) File "/home/alexkay28/RWKV-LM/RWKV-v4/src/trainer.py", line 177, in run run_epoch('train') File "/home/alexkay28/RWKV-LM/RWKV-v4/src/trainer.py", line 129, in run_epoch self.backward(loss) File "/home/vscode/.local/lib/python3.8/site-packages/lightning_lite/lite.py", line 260, in backward self._precision.backward(tensor, module, *args, **kwargs) File "/home/vscode/.local/lib/python3.8/site-packages/lightning_lite/plugins/precision/precision.py", line 68, in backward tensor.backward(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/_tensor.py", line 482, in backward torch.autograd.backward( File "/usr/local/lib/python3.8/dist-packages/torch/autograd/__init__.py", line 197, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "/home/vscode/.local/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 804, in reduce_partition_and_remove_grads self.reduce_ready_partitions_and_remove_grads(param, i) File "/home/vscode/.local/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1252, in reduce_ready_partitions_and_remove_grads self.reduce_independent_p_g_buckets_and_remove_grads(param, i) File "/home/vscode/.local/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 847, in reduce_independent_p_g_buckets_and_remove_grads new_grad_tensor = self.ipg_buffer[self.ipg_index].narrow(0, self.elements_in_ipg_bucket, param.numel()) AttributeError: 'DeepSpeedZeroOptimizer' object has no attribute 'ipg_index'
Try changing 'stage' from 2 to 1 in the configuration. Does it still have the same problem? I understand that this improves learning efficiency by partitioning parameters when learning a large model, but in my case, this solved the problem.
The official document describes the stage as follows:
Chooses different stages of ZeRO Optimizer. Stage 0, 1, 2, and 3 refer to disabled, optimizer state partitioning, and optimizer+gradient state partitioning, and optimizer+gradient+parameter partitioning, respectively.
I have solved my problem by choosing right combination of python version and packages versions. If someone is interested in it:
- python 3.8
- torch==2.0.0
- deepspeed==0.9.1
- pytorch-lightning==1.9.1
You can see (in my traceback) I was running deepspeed using pytorch-lightning interface. I was also playing with some configurations trying to provide predefined configurations from lightning like "deepspeed_strategy_2" and "deepspeed_strategy_3" and I got the same error every time, so I guess I just had some versions compatibility problem.
I have solved my problem by choosing right combination of python version and packages versions. If someone is interested in it:
* python 3.8 * torch==2.0.0 * deepspeed==0.9.1 * pytorch-lightning==1.9.1
You can see (in my traceback) I was running deepspeed using pytorch-lightning interface. I was also playing with some configurations trying to provide predefined configurations from lightning like "deepspeed_strategy_2" and "deepspeed_strategy_3" and I got the same error every time, so I guess I just had some versions compatibility problem.
This method can't solve my problem. I am also studying RWKV. Can you help me? My problem is that: Traceback (most recent call last): File "/data1/RWKV-LM/RWKV-v4/train.py", line 280, in trainer.run(m_cfg, train_dataset, None, tconf) File "/opt/miniconda3/envs/rwkb_py38/lib/python3.8/site-packages/lightning_fabric/fabric.py", line 628, in _run_impl return self._strategy.launcher.launch(run_method, *args, **kwargs) File "/opt/miniconda3/envs/rwkb_py38/lib/python3.8/site-packages/lightning_fabric/strategies/launchers/subprocess_script.py", line 90, in launch return function(*args, **kwargs) File "/opt/miniconda3/envs/rwkb_py38/lib/python3.8/site-packages/lightning_fabric/fabric.py", line 638, in _run_with_setup return run_function(*args, **kwargs) File "/data1/RWKV-LM/RWKV-v4/src/trainer.py", line 177, in run run_epoch('train') File "/data1/RWKV-LM/RWKV-v4/src/trainer.py", line 129, in run_epoch self.backward(loss) File "/opt/miniconda3/envs/rwkb_py38/lib/python3.8/site-packages/lightning_fabric/fabric.py", line 359, in backward self._precision.backward(tensor, module, *args, **kwargs) File "/opt/miniconda3/envs/rwkb_py38/lib/python3.8/site-packages/lightning_fabric/plugins/precision/precision.py", line 73, in backward tensor.backward(*args, **kwargs) File "/opt/miniconda3/envs/rwkb_py38/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward torch.autograd.backward( File "/opt/miniconda3/envs/rwkb_py38/lib/python3.8/site-packages/torch/autograd/init.py", line 200, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "/opt/miniconda3/envs/rwkb_py38/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 804, in reduce_partition_and_remove_grads self.reduce_ready_partitions_and_remove_grads(param, i) File "/opt/miniconda3/envs/rwkb_py38/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1252, in reduce_ready_partitions_and_remove_grads self.reduce_independent_p_g_buckets_and_remove_grads(param, i) File "/opt/miniconda3/envs/rwkb_py38/lib/python3.8/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 847, in reduce_independent_p_g_buckets_and_remove_grads new_grad_tensor = self.ipg_buffer[self.ipg_index].narrow(0, self.elements_in_ipg_bucket, param.numel()) AttributeError: 'DeepSpeedZeroOptimizer' object has no attribute 'ipg_index'
@maomao279 Have you tried v4neo? + are you sure you use the same versions during the run and which cuda version do you use (not sure the last is important, just want to know)?
I got the same issue. But fixed by remove a redundant backward.
outputs = model_engine(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
loss = outputs.loss
# loss.backward() # remove this line
model_engine.backward(loss)
model_engine.step()
And this code is from chatGPT, so it is excusable.
Has somebody found a solution other than using different package versions or changing to stage 1. I need stage 2 to work unfortunately and can not downgrade the package versions due to dependencies. Help really appreciated.
any idea? I got a similar bug: AttributeError: 'DeepSpeedZeroOptimizer' object has no attribute 'ipg_index'
I solved it by using DeepSpeedEngine.backward(loss)
and DeepSpeedEngine.step()
not torch nativeloss.backward()
and optimizer.step()
.
I solved it by using
DeepSpeedEngine.backward(loss)
andDeepSpeedEngine.step()
not torch nativeloss.backward()
andoptimizer.step()
.
Thanks for sharing this update. Can you clarify that you were seeing the same error as the original post?
Also, was your code following this guide for model porting: https://www.deepspeed.ai/getting-started/#writing-deepspeed-models