DeepSpeed [BUG] AttributeError: 'Parameter' object has no attribute 'scale'

Describe the bug I tried to apply deepspeed InferenceEngine on GPT-J-6B but ran into error AttributeError: 'Parameter' object has no attribute 'scale'. I can successfully speed up GPT2 and GPT-NEO, and I didn't get similar issues while searching on the internet, so not sure what happend.

I have no problem with doing inference if directly use the model loaded from huggingface, and only run into this error after apply the InferenceEngine on the model

To Reproduce Steps to reproduce the behavior:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_path = "EleutherAI/gpt-j-6b"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16)

import deepspeed

ds_model = deepspeed.init_inference(
    model,
    mp_size=1,
    dtype=torch.float16,
    replace_method="auto",
    replace_with_kernel_inject=True
)

text = "This is a sample prompt"
tokens = tokenizer.encode(text, return_tensors='pt').to(ds_model.module.device)
_ = model(tokens)

Below is the error traceback

------------------------------------------------------
Free memory : 3.268677 (GigaBytes)  
Total memory: 15.772339 (GigaBytes)  
Requested memory: 0.546875 (GigaBytes) 
Setting maximum total tokens (input + output) to 1024 
WorkSpace: 0x7f2732000000 
------------------------------------------------------
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "MYPATH1/pytorch/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "MYPATH2/anaconda3/envs/deepspeed/lib/python3.9/site-packages/transformers/models/gptj/modeling_gptj.py", line 852, in forward
    transformer_outputs = self.transformer(
  File "MYPATH1/pytorch/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "MYPATH2/anaconda3/envs/deepspeed/lib/python3.9/site-packages/transformers/models/gptj/modeling_gptj.py", line 687, in forward
    outputs = block(
  File "MYPATH1/pytorch/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "MYPATH2/anaconda3/envs/deepspeed/lib/python3.9/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 161, in forward
    output = self.mlp(attention_output, input, inp_norm, self.attention.attn_ob)
  File "MYPATH1/pytorch/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "MYPATH1/anaconda3/envs/deepspeed/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/ds_mlp.py", line 65, in forward
    output = self.fused_gemm_gelu(input=residual_norm,
  File "MYPATH1/pytorch/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "MYPATH2/anaconda3/envs/deepspeed/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/op_binding/gelu_gemm.py", line 26, in forward
    output = self.fused_gemm_gelu(input, weight, weight.scale, bias, weight_out, weight_out.scale,
AttributeError: 'Parameter' object has no attribute 'scale'

Expected behavior I would expect the inference call to work, and return logits and past_key_values

ds_report output

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1
 [WARNING]  please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['MYPATH1/pytorch/torch']
torch version .................... 2.1.0a0+gitb8580b0
deepspeed install path ........... ['MYPATH2/anaconda3/envs/deepspeed/lib/python3.9/site-packages/deepspeed']
deepspeed info ................... 0.9.1+cc67f22f, cc67f22f, master
torch cuda version ............... 12.0
torch hip version ................ None
nvcc version ..................... 12.0
deepspeed wheel compiled w. ...... torch 2.1, cuda 12.0

Screenshots No screenshots available

System info (please complete the following information):

OS: Ubuntu 18.04.6 LTS (GNU/Linux 5.4.0-1051-aws x86_64v)
GPU count and types 8 V100 but only using 1 in this experiment
DeepSpeed '0.9.1+cc67f22f', installed with pip install git+...
Hugging Face Transformers **version '4.29.0.dev0', installed with pip install git+... **
Python version 3.9.16
CUDA 12.0

Docker context Not using a Docker

Additional context Add any other context about the problem here.

Apr 15 '23 05:04 ZeratuuLL

@hemangjoshi37a Thanks for replying. I am doing this investigation. However, I feel strange as the inference support for GPT-J-6B is already requested back in 2021 ( See issue 1332) and my impression is that it's supported. When I check the InferenceEngine object, I can see that the original layers are replaced, which means that InferenceEngine is supposed to work for GPT-J-6B.

I will be checking the reason at the same time, but I do believe this is a bug and should be fixed

Apr 15 '23 17:04 ZeratuuLL

I also encountered some problems during debugging so also would like to share here. It would be great if I can get some help with the correct debugging steps/settings.

I tried to add breakpoints for all the return lines in deepspeed inference classes, including

DeepSpeedMLP class, forward method, line 87. File ds_mlp.py
DeepSpeedTransformerInference class, forward method, line 171, 173, and 175. File ds_transformer.py
GELUGemmOp class (the class causing error), forward method, line 26 and line 29. File gelu_gemm.py

Surprising, the code either directly finishes when using gpt2and gpt-neo-2.7b and totally ignores my breakpoints, or automatically stops when the error is raised, but showing

'Parameter' object has no attribute 'scale'
  File "MYPATH1/pytorch/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "MYPATH11/pytorch/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "MYPATH11/pytorch/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  [Previous line repeated 3 more times]

I then tried to manually step into the forward call and also realized that the VSCode keeps jumping in the same file, module.py in torch.nn.modules. It jumping between line 1495--1501, which is the first 7 lines of _call_impl method, and line 1601--1613, which is __getattr__ method. Of course, each time self is a different nn.Module object, but the code nevers goes to the definition files of these classes. I am not sure how to achieve that to get more details and compare different models in GELUGemmOp class.

I really appreciate all the help from the community!

Apr 16 '23 04:04 ZeratuuLL

@hemangjoshi37a I seriously doubt if you are a real person. Your github profile looks nice at the first glance but I think you are polluting this community with ChatGPT responses.

Looking at your response at https://github.com/microsoft/DeepSpeed/issues/3244 convince myself of this idea

Apr 16 '23 21:04 ZeratuuLL

Also I am a human like you and not AI for the context of this matter my WhatsApp number is +917016525813 you can call me and check out

Apr 16 '23 21:04 hemangjoshi37a

@ZeratuuLL I have encountered a similar problem, do you find any solution that's working?

File ".../miniconda3/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/op_binding/gelu_gemm.py", line 26, in forward
    output = self.fused_gemm_gelu(input, weight, weight.scale, bias, weight_out, weight_out.scale,
AttributeError: 'Parameter' object has no attribute 'scale'

Apr 17 '23 02:04 tnlin

@tnlin unfortunately no progress yet. As I listed my problems during debug, I am blocked from observing the details and cannot understand what's the difference. My only guess is that GPT-J uses ROPE but GPT-2 and GPT-neo are just learning their absolute position embeddings

Apr 17 '23 17:04 ZeratuuLL

Hi all, sorry for the slow response time on this! I have created a PR (https://github.com/microsoft/DeepSpeed/pull/3256) where I am now seeing model outputs match the HuggingFace baseline. If anyone has a chance to validate this locally as well, that would be a great help! Thanks!

Apr 17 '23 23:04 cmikeh2

@cmikeh2 hi dude, i merge PR (https://github.com/microsoft/DeepSpeed/pull/3256) into my local deepspeed code, and rebuild it, then i encounter another error File "/opt/conda/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/op_binding/gelu_gemm.py", line 26, in forward output = self.fused_gemm_gelu(input, weight, weight.scale if hasattr(weight, "scale") else torch.empty(1), RuntimeError: CUDA error: an illegal memory access was encountered

Apr 19 '23 11:04 CN-COTER

DeepSpeed DeepSpeed copied to clipboard

[BUG] AttributeError: 'Parameter' object has no attribute 'scale'

DeepSpeed
DeepSpeed copied to clipboard