AttributeError: 'DeepSpeedZeRoOffload' object has no attribute 'backward'
Describe the bug
Hello,I'm a novice using deepspeed. I used the ds_config.json but got outputs 'DeepSpeedZeRoOffload' object has no attribute 'backward'
The file as follows, can anyone give some suggestions?Thanks in advance!
{
"train_batch_size":4,
"fp16": {
"enabled": true,
"autocast": false,
"loss_scale": 0,
"loss_scale_window": 1000,
"initial_scale_power": 16,
"hysteresis": 2,
"min_loss_scale": 1
},
"zero_optimization": {
"stage": 3,
"offload_optimizer": {
"device": "cpu",
"pin_memory": false,
"nvme_path" :"/home/tmp"
},
"offload_param": {
"device": "cpu",
"pin_memory": false,
"nvme_path" :"/home/tmp",
"buffer_size": 1e10,
"max_in_cpu": 1e9
},
"overlap_comm": true,
"contiguous_gradients": true,
"sub_group_size": 5e8,
"reduce_bucket_size": "auto",
"stage3_prefetch_bucket_size": "auto",
"stage3_param_persistence_threshold": "auto",
"stage3_max_live_parameters": 5e8,
"stage3_max_reuse_distance": 5e8,
"stage3_gather_fp16_weights_on_model_save": true
},
"gradient_accumulation_steps": 1,
"gradient_clipping": "auto",
"steps_per_print": 2000,
"train_micro_batch_size_per_gpu": 2,
"wall_clock_breakdown": false
}
To Reproduce Steps to reproduce the behavior:
- See error

Expected behavior A clear and concise description of what you expected to happen.
ds_report output
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
[WARNING] please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-devel package with yum
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
spatial_inference ...... [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/letrain/miniconda/envs/bloom/lib/python3.8/site-packages/torch']
torch version .................... 1.12.0+cu102
torch cuda version ............... 10.2
torch hip version ................ None
nvcc version ..................... 10.2
deepspeed install path ........... ['/home/letrain/miniconda/envs/bloom/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.7.7, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.12, cuda 10.2
@upwindflys, are you trying to do training or inference? Can you share how to repro this, including command line and code?
I am having the same problem. This happens while trying to run training with offloading enabled. I am using "Accelerate", however this doesn't seem to be isolated problem.
Are you passing an optimizer to deepspeed.initialize()? Can you share your code or steps to repro?
@upwindflys or @WadRex, are you able to resolve this issue by passing optimizer to deepspeed.initialize()?
Closing for lack of response. Please reopen as needed.
Hello, I am also having the same issue. How did you solve it in the end?
Hi, I get the same issue too. my ds config :
{
"bf16": {
"enabled": "true"
},
"zero_optimization": {
"stage": 3,
"offload_optimizer": {
"device": "cpu",
"pin_memory": true
},
"offload_param": {
"device": "cpu",
"pin_memory": true
},
"overlap_comm": true,
"contiguous_gradients": true,
"reduce_bucket_size": "auto",
"stage3_prefetch_bucket_size": "auto",
"stage3_param_persistence_threshold": "auto",
"sub_group_size": 1e9,
"stage3_max_live_parameters": 1e9,
"stage3_max_reuse_distance": 1e9,
"stage3_gather_16bit_weights_on_model_save": "auto"
},
"gradient_accumulation_steps": 8,
"gradient_clipping": "auto",
"mixed_precision": "fp16",
"steps_per_print": 2000,
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"wall_clock_breakdown": false
}
my train script:
for step, batch in enumerate(train_loader):
with accelerator.accumulate(model):
inputs = batch["input_ids"].to(accelerator.device)
targets = batch["labels"].to(accelerator.device)
model_output = model(input_ids=inputs, labels=targets, return_dict=True)
loss = model_output.loss
accelerator.backward(loss)
the error:
AttributeError: 'DeepSpeedZeRoOffload' object has no attribute 'backward'AttributeError
I'm trying to do training, and I didn't pass an optimizer to deepspeed.initialize(). How can I solve it?
@Muttermal, you can pass an optimizer through ds_config as follows: https://www.deepspeed.ai/docs/config-json/#optimizer-parameters
@tjruwase Thank you for your reply, I passed optimizer and scheduler to my ds_config. I use accelerate for training and I get a new error: #https://github.com/huggingface/transformers/issues/26148 This seems to be an issue with transformer or accelerate. Thank you for your reply.
@Muttermal, the new issue should not exist with latest versions of those libraries, except if there is a recent regression. Can you please share your failing stack trace here?