DeepSpeed [zero-3] add support for new params added during fwd pass

/cc @stas00

Dec 01 '21 23:12 jeffra

Does this code also remove the old param that is no longer in the submodule?

In the particular case of https://github.com/huggingface/transformers/blob/fbe278c76c56d97df98b5884e6856c168cd2a396/src/transformers/models/m2m_100/modeling_m2m_100.py#L133-L134 where a new param is added during forward it's not a new param, but a replacement for the old param with the same name. it just gets resized.

Dec 02 '21 01:12 stas00

new failure with this branch:

E             File "/mnt/nvme1/code/huggingface/transformers-ds-model-zoo-2/examples/pytorch/translation/run_translation.py", line 620, in <module>
E             File "/mnt/nvme1/code/huggingface/transformers-ds-model-zoo-2/examples/pytorch/translation/run_translation.py", line 620, in <module>
E                   main()main()
E           
E             File "/mnt/nvme1/code/huggingface/transformers-ds-model-zoo-2/examples/pytorch/translation/run_translation.py", line 537, in main
E             File "/mnt/nvme1/code/huggingface/transformers-ds-model-zoo-2/examples/pytorch/translation/run_translation.py", line 537, in main
E               train_result = trainer.train(resume_from_checkpoint=checkpoint)
E             File "/mnt/nvme1/code/huggingface/transformers-ds-model-zoo-2/src/transformers/trainer.py", line 1316, in train
E               train_result = trainer.train(resume_from_checkpoint=checkpoint)
E             File "/mnt/nvme1/code/huggingface/transformers-ds-model-zoo-2/src/transformers/trainer.py", line 1316, in train
E               tr_loss_step = self.training_step(model, inputs)
E             File "/mnt/nvme1/code/huggingface/transformers-ds-model-zoo-2/src/transformers/trainer.py", line 1849, in training_step
E               tr_loss_step = self.training_step(model, inputs)
E             File "/mnt/nvme1/code/huggingface/transformers-ds-model-zoo-2/src/transformers/trainer.py", line 1849, in training_step
E               loss = self.compute_loss(model, inputs)
E             File "/mnt/nvme1/code/huggingface/transformers-ds-model-zoo-2/src/transformers/trainer.py", line 1881, in compute_loss
E               loss = self.compute_loss(model, inputs)
E             File "/mnt/nvme1/code/huggingface/transformers-ds-model-zoo-2/src/transformers/trainer.py", line 1881, in compute_loss
E               outputs = model(**inputs)
E             File "/home/stas/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
E               outputs = model(**inputs)
E             File "/home/stas/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
E               return forward_call(*input, **kwargs)
E             File "/mnt/nvme1/code/github/00optimize/deepspeed/deepspeed/runtime/engine.py", line 1606, in forward
E               return forward_call(*input, **kwargs)
E             File "/mnt/nvme1/code/github/00optimize/deepspeed/deepspeed/runtime/engine.py", line 1606, in forward
E               loss = self.module(*inputs, **kwargs)
E             File "/home/stas/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1120, in _call_impl
E               result = forward_call(*input, **kwargs)
E             File "/mnt/nvme1/code/huggingface/transformers-ds-model-zoo-2/src/transformers/models/m2m_100/modeling_m2m_100.py", line 1303, in forward
E               loss = self.module(*inputs, **kwargs)
E             File "/home/stas/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1120, in _call_impl
E               outputs = self.model(
E             File "/home/stas/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1120, in _call_impl
E               result = forward_call(*input, **kwargs)
E             File "/mnt/nvme1/code/huggingface/transformers-ds-model-zoo-2/src/transformers/models/m2m_100/modeling_m2m_100.py", line 1303, in forward
E               result = forward_call(*input, **kwargs)
E             File "/mnt/nvme1/code/huggingface/transformers-ds-model-zoo-2/src/transformers/models/m2m_100/modeling_m2m_100.py", line 1163, in forward
E               encoder_outputs = self.encoder(
E             File "/home/stas/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1120, in _call_impl
E               outputs = self.model(
E             File "/home/stas/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1120, in _call_impl
E               result = forward_call(*input, **kwargs)
E             File "/mnt/nvme1/code/huggingface/transformers-ds-model-zoo-2/src/transformers/models/m2m_100/modeling_m2m_100.py", line 776, in forward
E               result = forward_call(*input, **kwargs)
E             File "/mnt/nvme1/code/huggingface/transformers-ds-model-zoo-2/src/transformers/models/m2m_100/modeling_m2m_100.py", line 1163, in forward
E               embed_pos = self.embed_positions(input_ids, inputs_embeds)
E             File "/home/stas/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1120, in _call_impl
E               result = forward_call(*input, **kwargs)
E             File "/home/stas/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
E               encoder_outputs = self.encoder(
E             File "/home/stas/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1120, in _call_impl
E               return func(*args, **kwargs)
E             File "/mnt/nvme1/code/huggingface/transformers-ds-model-zoo-2/src/transformers/models/m2m_100/modeling_m2m_100.py", line 177, in forward
E               self.make_weights(max_pos + self.offset, self.embedding_dim, self.padding_idx)
E             File "/mnt/nvme1/code/huggingface/transformers-ds-model-zoo-2/src/transformers/models/m2m_100/modeling_m2m_100.py", line 136, in make_weights
E               self.weights.requires_grad = False
E             File "/home/stas/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1168, in __getattr__
E               result = forward_call(*input, **kwargs)
E             File "/mnt/nvme1/code/huggingface/transformers-ds-model-zoo-2/src/transformers/models/m2m_100/modeling_m2m_100.py", line 776, in forward
E               return _parameters[name]
E             File "/mnt/nvme1/code/github/00optimize/deepspeed/deepspeed/runtime/zero/stage3.py", line 150, in __getitem__
E               assert len(zero_params) > 0
E           AssertionError
E               embed_pos = self.embed_positions(input_ids, inputs_embeds)
E             File "/home/stas/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1120, in _call_impl
E               result = forward_call(*input, **kwargs)
E             File "/home/stas/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
E               return func(*args, **kwargs)
E             File "/mnt/nvme1/code/huggingface/transformers-ds-model-zoo-2/src/transformers/models/m2m_100/modeling_m2m_100.py", line 177, in forward
E               self.make_weights(max_pos + self.offset, self.embedding_dim, self.padding_idx)
E             File "/mnt/nvme1/code/huggingface/transformers-ds-model-zoo-2/src/transformers/models/m2m_100/modeling_m2m_100.py", line 136, in make_weights
E               self.weights.requires_grad = False
E             File "/home/stas/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1168, in __getattr__
E               return _parameters[name]
E             File "/mnt/nvme1/code/github/00optimize/deepspeed/deepspeed/runtime/zero/stage3.py", line 150, in __getitem__
E               assert len(zero_params) > 0

Dec 02 '21 01:12 stas00

To reproduce:

git clone https://github.com/stas00/transformers/
cd transformers
git checkout ds-model-zoo-2
RUN_SLOW=1 pytest tests/deepspeed/test_model_zoo.py -k test_zero_to_fp32_zero3_trans_m2m_100 -sv

with master it fails with:

  File "/home/stas/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1168, in __getattr__
    self.weights.requires_grad = False
  File "/home/stas/anaconda3/envs/py38-pt110/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1168, in __getattr__
    return _parameters[name]
  File "/mnt/nvme1/code/github/00optimize/deepspeed/deepspeed/runtime/zero/stage3.py", line 146, in __getitem__
    return _parameters[name]
  File "/mnt/nvme1/code/github/00optimize/deepspeed/deepspeed/runtime/zero/stage3.py", line 146, in __getitem__
    if param.ds_status == ZeroParamStatus.NOT_AVAILABLE:

Dec 02 '21 01:12 stas00

Thanks @stas00, I see the issue now. I think I know how we can address this, will give it a try. I was able to repro the issue with your script as well.

The current PR supports adding an entirely new parameter, however replacing an existing parameter is a slightly different case that it seems needs to be handled differently. Will update soon.

Dec 02 '21 20:12 jeffra

Additionally I wonder if someone may have a case where they don't replace but remove the pre-existing param and then add a new param with a different name. This is not the case I'm dealing with, but just suggesting that perhaps this could be another use-case.

Dec 02 '21 23:12 stas00

a gentle ping

Dec 09 '21 05:12 stas00

created an issue that this PR is trying to solve so that it's easier to track: https://github.com/microsoft/DeepSpeed/issues/1757

Feb 10 '22 04:02 stas00

Can one of the admins verify this patch?

Jun 09 '22 20:06 rocm-mici

@jeffra Is this PR still relevant? If so I can revive these changes with the current master branch.

Aug 25 '23 23:08 jomayeri