DeepSpeed
DeepSpeed copied to clipboard
[BUG] AttributeError: 'Parameter' object has no attribute 'scale'
Describe the bug
I tried to apply deepspeed InferenceEngine
on GPT-J-6B
but ran into error AttributeError: 'Parameter' object has no attribute 'scale'
. I can successfully speed up GPT2 and GPT-NEO, and I didn't get similar issues while searching on the internet, so not sure what happend.
I have no problem with doing inference if directly use the model loaded from huggingface, and only run into this error after apply the InferenceEngine
on the model
To Reproduce Steps to reproduce the behavior:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_path = "EleutherAI/gpt-j-6b"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16)
import deepspeed
ds_model = deepspeed.init_inference(
model,
mp_size=1,
dtype=torch.float16,
replace_method="auto",
replace_with_kernel_inject=True
)
text = "This is a sample prompt"
tokens = tokenizer.encode(text, return_tensors='pt').to(ds_model.module.device)
_ = model(tokens)
Below is the error traceback
------------------------------------------------------
Free memory : 3.268677 (GigaBytes)
Total memory: 15.772339 (GigaBytes)
Requested memory: 0.546875 (GigaBytes)
Setting maximum total tokens (input + output) to 1024
WorkSpace: 0x7f2732000000
------------------------------------------------------
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "MYPATH1/pytorch/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "MYPATH2/anaconda3/envs/deepspeed/lib/python3.9/site-packages/transformers/models/gptj/modeling_gptj.py", line 852, in forward
transformer_outputs = self.transformer(
File "MYPATH1/pytorch/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "MYPATH2/anaconda3/envs/deepspeed/lib/python3.9/site-packages/transformers/models/gptj/modeling_gptj.py", line 687, in forward
outputs = block(
File "MYPATH1/pytorch/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "MYPATH2/anaconda3/envs/deepspeed/lib/python3.9/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 161, in forward
output = self.mlp(attention_output, input, inp_norm, self.attention.attn_ob)
File "MYPATH1/pytorch/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "MYPATH1/anaconda3/envs/deepspeed/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/ds_mlp.py", line 65, in forward
output = self.fused_gemm_gelu(input=residual_norm,
File "MYPATH1/pytorch/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "MYPATH2/anaconda3/envs/deepspeed/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/op_binding/gelu_gemm.py", line 26, in forward
output = self.fused_gemm_gelu(input, weight, weight.scale, bias, weight_out, weight_out.scale,
AttributeError: 'Parameter' object has no attribute 'scale'
Expected behavior
I would expect the inference call to work, and return logits
and past_key_values
ds_report output
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1
[WARNING] please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['MYPATH1/pytorch/torch']
torch version .................... 2.1.0a0+gitb8580b0
deepspeed install path ........... ['MYPATH2/anaconda3/envs/deepspeed/lib/python3.9/site-packages/deepspeed']
deepspeed info ................... 0.9.1+cc67f22f, cc67f22f, master
torch cuda version ............... 12.0
torch hip version ................ None
nvcc version ..................... 12.0
deepspeed wheel compiled w. ...... torch 2.1, cuda 12.0
Screenshots No screenshots available
System info (please complete the following information):
- OS: Ubuntu 18.04.6 LTS (GNU/Linux 5.4.0-1051-aws x86_64v)
- GPU count and types 8 V100 but only using 1 in this experiment
- DeepSpeed
'0.9.1+cc67f22f'
, installed withpip install git+...
- Hugging Face Transformers **version
'4.29.0.dev0'
, installed withpip install git+...
** - Python version 3.9.16
- CUDA 12.0
Docker context Not using a Docker
Additional context Add any other context about the problem here.
@hemangjoshi37a Thanks for replying. I am doing this investigation. However, I feel strange as the inference support for GPT-J-6B is already requested back in 2021 ( See issue 1332) and my impression is that it's supported. When I check the InferenceEngine object, I can see that the original layers are replaced, which means that InferenceEngine is supposed to work for GPT-J-6B.
I will be checking the reason at the same time, but I do believe this is a bug and should be fixed
I also encountered some problems during debugging so also would like to share here. It would be great if I can get some help with the correct debugging steps/settings.
I tried to add breakpoints for all the return
lines in deepspeed inference classes, including
-
DeepSpeedMLP
class,forward
method, line 87. Fileds_mlp.py
-
DeepSpeedTransformerInference
class,forward
method, line 171, 173, and 175. Fileds_transformer.py
-
GELUGemmOp
class (the class causing error),forward
method, line 26 and line 29. Filegelu_gemm.py
Surprising, the code either directly finishes when using gpt2
and gpt-neo-2.7b
and totally ignores my breakpoints, or automatically stops when the error is raised, but showing
'Parameter' object has no attribute 'scale'
File "MYPATH1/pytorch/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "MYPATH11/pytorch/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "MYPATH11/pytorch/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
[Previous line repeated 3 more times]
I then tried to manually step into the forward call and also realized that the VSCode keeps jumping in the same file, module.py
in torch.nn.modules
. It jumping between line 1495--1501
, which is the first 7 lines of _call_impl
method, and line 1601--1613
, which is __getattr__
method. Of course, each time self
is a different nn.Module
object, but the code nevers goes to the definition files of these classes. I am not sure how to achieve that to get more details and compare different models in GELUGemmOp
class.
I really appreciate all the help from the community!
@hemangjoshi37a I seriously doubt if you are a real person. Your github profile looks nice at the first glance but I think you are polluting this community with ChatGPT responses.
Looking at your response at https://github.com/microsoft/DeepSpeed/issues/3244 convince myself of this idea
Also I am a human like you and not AI for the context of this matter my WhatsApp number is +917016525813 you can call me and check out
@ZeratuuLL I have encountered a similar problem, do you find any solution that's working?
File ".../miniconda3/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/op_binding/gelu_gemm.py", line 26, in forward
output = self.fused_gemm_gelu(input, weight, weight.scale, bias, weight_out, weight_out.scale,
AttributeError: 'Parameter' object has no attribute 'scale'
@tnlin unfortunately no progress yet. As I listed my problems during debug, I am blocked from observing the details and cannot understand what's the difference. My only guess is that GPT-J uses ROPE but GPT-2 and GPT-neo are just learning their absolute position embeddings
Hi all, sorry for the slow response time on this! I have created a PR (https://github.com/microsoft/DeepSpeed/pull/3256) where I am now seeing model outputs match the HuggingFace baseline. If anyone has a chance to validate this locally as well, that would be a great help! Thanks!
@cmikeh2 hi dude, i merge PR (https://github.com/microsoft/DeepSpeed/pull/3256) into my local deepspeed code, and rebuild it, then i encounter another error
File "/opt/conda/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/op_binding/gelu_gemm.py", line 26, in forward output = self.fused_gemm_gelu(input, weight, weight.scale if hasattr(weight, "scale") else torch.empty(1), RuntimeError: CUDA error: an illegal memory access was encountered