How to run IPEX inference more than once
I am trying to set up a function to use IPEX and call the inference with the minimum load time as possible That means I want to reuse the traced_model variable instead of reloading the file for every inference So far, I have not found a way to do this.
It will run once, but the second time I always get an error. I am looking for a way to use the model several times but only need to run the setup one time. Show below is a code sample and the associated colab link: https://colab.research.google.com/drive/1FLRzfv5Ir_a2bcPQUUYl-QTapxP0FXC6?usp=sharing
RuntimeError Traceback (most recent call last)
[<ipython-input-42-699bd88acb21>](https://localhost:8080/#) in <module>()
1 for i in range(2):
----> 2 test_out = loaded_model2(input_ids,attention_mask)
3 print(test_out[0][0].detach().numpy())
[/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1129 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130 return forward_call(*input, **kwargs)
1131 # Do not call functions when jit is used
1132 full_backward_hooks, non_full_backward_hooks = [], []
RuntimeError: 0 INTERNAL ASSERT FAILED at "../torch/csrc/jit/ir/alias_analysis.cpp":608, please report a bug to PyTorch. We don't have an op for ipex::distil_mha_scores_calc but it isn't a special case. Argument types: Tensor, Tensor, Tensor, int[], int, int, Tensor, float,
Candidates:
ipex::distil_mha_scores_calc(Tensor q, Tensor k, Tensor mask_qk, int[] mask_qk_reshp, int transpose_dim_a, int transpose_dim_b, Scalar fill, Scalar dim_per_head) -> (Tensor)
#!pip -q install transformers
#!python -m pip install intel_extension_for_pytorch==1.12.100
import torch
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
name = "distilbert-base-uncased-finetuned-sst-2-english"
model = DistilBertForSequenceClassification.from_pretrained(name, torchscript=True)
tokenizer = DistilBertTokenizer.from_pretrained(name)
input_text="This movie was really horrible and I did not like it!"
inputs = tokenizer(input_text, padding="max_length", max_length=512, return_tensors="pt")
input_ids=torch.tensor(inputs["input_ids"].numpy())
attention_mask=torch.tensor(inputs["attention_mask"].numpy())
model.eval()
import intel_extension_for_pytorch as ipex
model = ipex.optimize(model)
with torch.no_grad():
traced_model = torch.jit.trace(model, [input_ids, attention_mask], check_trace=False, strict=False)
traced_model = torch.jit.freeze(traced_model)
torch.jit.save(traced_model, "traced_bert.pt")
loaded_model = torch.jit.load("traced_bert.pt")
#Does not help
import copy
loaded_model2=copy.deepcopy(loaded_model)
#Runs correctly the first time but won't run again unless I reload the model from disk
for i in range(2):
test_out = loaded_model2(input_ids,attention_mask)
print(test_out[0][0].detach().numpy())
Thanks for reporting this issue. We will look into it.
Please add torch._C._jit_set_profiling_mode(False) as a workaround at this time.
with torch.no_grad():
torch._C._jit_set_profiling_mode(False)
traced_model = torch.jit.trace(model, [input_ids, attention_mask], check_trace=False, strict=False)
traced_model = torch.jit.freeze(traced_model)