ChatGLM-6B [Help] How to convert the model to torchscript

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

I tried to deploy the model and convert it to torchscipt using 'trace' and 'script',but both attempts failed

Expected Behavior

No response

Steps To Reproduce

here is my code:

import torch
from transformers import AutoTokenizer, AutoModel
device = 'cuda' if torch.cuda.is_available() else 'cpu'

class Wrapper(torch.nn.Module):
    """
    Wrapper for the model to be traced
    """
    def __init__(self):
        super().__init__()
        self.model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().to(device)
        self.tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)

    def forward(self, input_ids):
        self.model.eval()
        input_ids = input_ids.to(device)
        print('input_ids type:', input_ids.dtype)
        outputs = self.model.generate(input_ids=input_ids, max_length=2048, num_beams=1, do_sample=True, top_p=0.7,temperature=0.95)
        return outputs[0, len(input_ids[0]) - 2:]

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
query = "hello"
input = tokenizer([query], return_tensors="pt", padding=True)
model = Wrapper()
#torch.jit.trace(model, (input.input_ids,)).save("chatglm-6b.pt")
traced_model = torch.jit.script(model) # 使用torch.jit.script()函数代替torch.jit.trace()函数
traced_model.save("chatglm-6b.pt")

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

Mar 16 '23 08:03 Stxr

ون ميديا

Mar 16 '23 16:03 mohammadhammoudeh

trace下修改成:
self.model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True, torchscript=True).half().to(device) self.tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True, torchscript=True) script下要修改的东西比较多，比如script不支持nn.module等，建议使用trance

Jun 28 '23 11:06 ArtyZe

Hey, any updates on this? When I try to trace the model I get the following warnings:

huggingface/modules/transformers_modules/THUDM/chatglm-6b/619e736c6d4cd139840579c5482063b75bed5666/modeling_chatglm.py:1000: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
layer_id=torch.tensor(i),
huggingface/modules/transformers_modules/THUDM/chatglm-6b/619e736c6d4cd139840579c5482063b75bed5666/modeling_chatglm.py:200: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   if self.max_seq_len_cached is None or (seq_len > self.max_seq_len_cached):
 huggingface/modules/transformers_modules/THUDM/chatglm-6b/619e736c6d4cd139840579c5482063b75bed5666/modeling_chatglm.py:267: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   query_key_layer_scaling_coeff = float(layer_id + 1)
 huggingface/modules/transformers_modules/THUDM/chatglm-6b/619e736c6d4cd139840579c5482063b75bed5666/modeling_chatglm.py:269: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
   query_layer = query_layer / (math.sqrt(hidden_size) * query_key_layer_scaling_coeff)
 huggingface/modules/transformers_modules/THUDM/chatglm-6b/619e736c6d4cd139840579c5482063b75bed5666/modeling_chatglm.py:304: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

After that I successfully save the model. But when I try to load recently saved model I get the following error:

  if not (attention_mask == 0).all():
Traceback (most recent call last):
  File "chatglm_ts.py", line 36, in <module>
    model_loaded = torch.jit.load('chatglm_full1.pth')
  File "/home/gkapustin/.local/lib/python3.8/site-packages/torch/jit/_serialization.py", line 162, in load
    cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files, _restore_shapes)  # type: ignore[call-arg]
RuntimeError:
Expression of type - cannot be used in a type expression:
__torch__.transformers_modules.THUDM.chatglm-6b.619e736c6d4cd139840579c5482063b75bed5666.modeling_chatglm.ChatGLMForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE

Jul 27 '23 12:07 Its-astonishing

I've faced with the same issue.

Aug 12 '23 09:08 0de554K

Traceback (most recent call last): File "/root/autodl-tmp/test.py", line 11, in trace_model = torch.jit.load("./traced_chatglm2_6b_fp16.pt") File "/root/miniconda3/envs/glm118/lib/python3.10/site-packages/torch/jit/_serialization.py", line 162, in load cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files, _restore_shapes) # type: ignore[call-arg] RuntimeError: expected newline but found 'ident' here: Serialized File "code/torch.py", line 6 training : bool _is_full_backward_hook : Optional[bool] model : torch.transformers_modules.THUDM.chatglm-6b.619e736c6d4cd139840579c5482063b75bed5666.modeling_chatglm.ChatGLMForConditionalGeneration ~ <--- HERE def forward(self: torch.Tracable, input_ids: Tensor,

Aug 18 '23 14:08 mymagicpower

rename dir "chatglm-6b" to "chatglm_6b" without dash to solve this error:

Expression of type - cannot be used in a type expression:
__torch__.transformers_modules.THUDM.chatglm-6b.619e736c6d4cd139840579c5482063b75bed5666.modeling_chatglm.ChatGLMForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE

Sep 01 '23 03:09 levinxo

rename dir "chatglm-6b" to "chatglm_6b" issue fixed.

Sep 01 '23 07:09 mymagicpower