[Help] How to convert the model to torchscript
Is there an existing issue for this?
- [X] I have searched the existing issues
Current Behavior
I tried to deploy the model and convert it to torchscipt using 'trace' and 'script',but both attempts failed
Expected Behavior
No response
Steps To Reproduce
here is my code:
import torch
from transformers import AutoTokenizer, AutoModel
device = 'cuda' if torch.cuda.is_available() else 'cpu'
class Wrapper(torch.nn.Module):
"""
Wrapper for the model to be traced
"""
def __init__(self):
super().__init__()
self.model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().to(device)
self.tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
def forward(self, input_ids):
self.model.eval()
input_ids = input_ids.to(device)
print('input_ids type:', input_ids.dtype)
outputs = self.model.generate(input_ids=input_ids, max_length=2048, num_beams=1, do_sample=True, top_p=0.7,temperature=0.95)
return outputs[0, len(input_ids[0]) - 2:]
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
query = "hello"
input = tokenizer([query], return_tensors="pt", padding=True)
model = Wrapper()
#torch.jit.trace(model, (input.input_ids,)).save("chatglm-6b.pt")
traced_model = torch.jit.script(model) # 使用torch.jit.script()函数代替torch.jit.trace()函数
traced_model.save("chatglm-6b.pt")
Environment
- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :
Anything else?
No response
trace下修改成:
self.model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True, torchscript=True).half().to(device)
self.tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True, torchscript=True)
script下要修改的东西比较多,比如script不支持nn.module等,建议使用trance
Hey, any updates on this? When I try to trace the model I get the following warnings:
huggingface/modules/transformers_modules/THUDM/chatglm-6b/619e736c6d4cd139840579c5482063b75bed5666/modeling_chatglm.py:1000: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
layer_id=torch.tensor(i),
huggingface/modules/transformers_modules/THUDM/chatglm-6b/619e736c6d4cd139840579c5482063b75bed5666/modeling_chatglm.py:200: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if self.max_seq_len_cached is None or (seq_len > self.max_seq_len_cached):
huggingface/modules/transformers_modules/THUDM/chatglm-6b/619e736c6d4cd139840579c5482063b75bed5666/modeling_chatglm.py:267: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
query_key_layer_scaling_coeff = float(layer_id + 1)
huggingface/modules/transformers_modules/THUDM/chatglm-6b/619e736c6d4cd139840579c5482063b75bed5666/modeling_chatglm.py:269: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
query_layer = query_layer / (math.sqrt(hidden_size) * query_key_layer_scaling_coeff)
huggingface/modules/transformers_modules/THUDM/chatglm-6b/619e736c6d4cd139840579c5482063b75bed5666/modeling_chatglm.py:304: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
After that I successfully save the model. But when I try to load recently saved model I get the following error:
if not (attention_mask == 0).all():
Traceback (most recent call last):
File "chatglm_ts.py", line 36, in <module>
model_loaded = torch.jit.load('chatglm_full1.pth')
File "/home/gkapustin/.local/lib/python3.8/site-packages/torch/jit/_serialization.py", line 162, in load
cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files, _restore_shapes) # type: ignore[call-arg]
RuntimeError:
Expression of type - cannot be used in a type expression:
__torch__.transformers_modules.THUDM.chatglm-6b.619e736c6d4cd139840579c5482063b75bed5666.modeling_chatglm.ChatGLMForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
I've faced with the same issue.
Traceback (most recent call last):
File "/root/autodl-tmp/test.py", line 11, in
rename dir "chatglm-6b" to "chatglm_6b" without dash to solve this error:
Expression of type - cannot be used in a type expression:
__torch__.transformers_modules.THUDM.chatglm-6b.619e736c6d4cd139840579c5482063b75bed5666.modeling_chatglm.ChatGLMForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
rename dir "chatglm-6b" to "chatglm_6b" issue fixed.