MS-AMP icon indicating copy to clipboard operation
MS-AMP copied to clipboard

AttributeError: 'ScalingTensor' object has no attribute 'view'

Open LSC527 opened this issue 7 months ago • 3 comments

What's the issue, what's expected?: Error when using ms-amp to do llm sft. ms-amp deepspeed config: "msamp": { "enabled": true, "opt_level": "O1|O2|O3", # all tried "use_te": false }

How to reproduce it?: Follow the setup of DeepSpeed-Chat, and do some small code modify to enable ms-amp in DeepSpeed-Chat/training/step1_supervised_finetuning/main.py:

line 20 modify: import deepspeed -> from msamp import deepspeed

line 230 add: ds_config["msamp"] = { "enabled": True, "opt_level": "O1|O2|O3", "use_te": False }

Log message or shapshot?:

Traceback (most recent call last):
  File "/home/work/DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/main.py", line 400, in <module>
    main()
  File "/home/work/DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/main.py", line 369, in main
    model.backward(loss)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/msamp/deepspeed/runtime/engine.py", line 405, in backward
    self.optimizer.backward(loss, retain_graph=retain_graph)
  File "/usr/local/lib/python3.10/dist-packages/msamp/deepspeed/runtime/zero/fp8_stage_1_and_2.py", line 951, in backward
    super().backward(loss.float(), retain_graph=retain_graph)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 2040, in backward
    self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
    scaled_loss.backward(retain_graph=retain_graph)
  File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 491, in backward
    torch.autograd.backward(
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 288, in apply
    return user_fn(self, *args)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 288, in backward
    torch.autograd.backward(outputs_with_grad, args_with_grad)
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 288, in apply
    return user_fn(self, *args)
  File "/usr/local/lib/python3.10/dist-packages/msamp/nn/functional.py", line 123, in backward
    ctx.weight.backward_grad_update(wgrad)
  File "/usr/local/lib/python3.10/dist-packages/msamp/common/tensor/tensor.py", line 130, in backward_grad_update
    self._backward_post_hooks(grad)
  File "/usr/local/lib/python3.10/dist-packages/msamp/common/tensor/hook.py", line 47, in __call__
    hook(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1581, in _call_impl
    hook_result = hook(self, args, result)
  File "/usr/local/lib/python3.10/dist-packages/msamp/deepspeed/runtime/zero/fp8_stage_1_and_2.py", line 386, in reduce_partition_and_remove_grads
    self.fp8_reduce_ready_partitions_and_remove_grads(param, i)
  File "/usr/local/lib/python3.10/dist-packages/msamp/deepspeed/runtime/zero/fp8_stage_1_and_2.py", line 595, in fp8_reduce_ready_partitions_and_remove_grads
    self.fp8_reduce_independent_p_g_buckets_and_remove_grads(param, i)
  File "/usr/local/lib/python3.10/dist-packages/msamp/deepspeed/runtime/zero/fp8_stage_1_and_2.py", line 412, in fp8_reduce_independent_p_g_buckets_and_remove_grads
    self.fp8_reduce_ipg_grads()
  File "/usr/local/lib/python3.10/dist-packages/msamp/deepspeed/runtime/zero/fp8_stage_1_and_2.py", line 541, in fp8_reduce_ipg_grads
    self.fp8_average_tensor(self.fp8_extra_large_param_to_reduce.grad.view(-1))
AttributeError: 'ScalingTensor' object has no attribute 'view'

Additional information: env: ghcr.io/azure/msamp:v0.4.0-cuda12.2 gpu: h100 * 8

LSC527 avatar Jul 25 '24 07:07 LSC527