intel-extension-for-pytorch How to run IPEX inference more than once

I am trying to set up a function to use IPEX and call the inference with the minimum load time as possible That means I want to reuse the traced_model variable instead of reloading the file for every inference So far, I have not found a way to do this.

It will run once, but the second time I always get an error. I am looking for a way to use the model several times but only need to run the setup one time. Show below is a code sample and the associated colab link: https://colab.research.google.com/drive/1FLRzfv5Ir_a2bcPQUUYl-QTapxP0FXC6?usp=sharing

RuntimeError                              Traceback (most recent call last)
[<ipython-input-42-699bd88acb21>](https://localhost:8080/#) in <module>()
      1 for i in range(2):
----> 2   test_out = loaded_model2(input_ids,attention_mask)
      3   print(test_out[0][0].detach().numpy())

[/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in _call_impl(self, *input, **kwargs)
   1128         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1129                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130             return forward_call(*input, **kwargs)
   1131         # Do not call functions when jit is used
   1132         full_backward_hooks, non_full_backward_hooks = [], []

RuntimeError: 0 INTERNAL ASSERT FAILED at "../torch/csrc/jit/ir/alias_analysis.cpp":608, please report a bug to PyTorch. We don't have an op for ipex::distil_mha_scores_calc but it isn't a special case.  Argument types: Tensor, Tensor, Tensor, int[], int, int, Tensor, float, 

Candidates:
	ipex::distil_mha_scores_calc(Tensor q, Tensor k, Tensor mask_qk, int[] mask_qk_reshp, int transpose_dim_a, int transpose_dim_b, Scalar fill, Scalar dim_per_head) -> (Tensor)

#!pip -q install transformers
#!python -m pip install intel_extension_for_pytorch==1.12.100

import torch
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification

name = "distilbert-base-uncased-finetuned-sst-2-english"
model = DistilBertForSequenceClassification.from_pretrained(name, torchscript=True)
tokenizer = DistilBertTokenizer.from_pretrained(name)

input_text="This movie was really horrible and I did not like it!"
inputs = tokenizer(input_text, padding="max_length", max_length=512, return_tensors="pt")
input_ids=torch.tensor(inputs["input_ids"].numpy())
attention_mask=torch.tensor(inputs["attention_mask"].numpy())

model.eval()

import intel_extension_for_pytorch as ipex
model = ipex.optimize(model)

with torch.no_grad():
  traced_model = torch.jit.trace(model, [input_ids, attention_mask], check_trace=False, strict=False)
  traced_model = torch.jit.freeze(traced_model)
torch.jit.save(traced_model, "traced_bert.pt")

loaded_model = torch.jit.load("traced_bert.pt")

#Does not help
import copy
loaded_model2=copy.deepcopy(loaded_model)

#Runs correctly the first time but won't run again unless I reload the model from disk
for i in range(2):
  test_out = loaded_model2(input_ids,attention_mask)
  print(test_out[0][0].detach().numpy())

Aug 15 '22 16:08 ActionPace

Thanks for reporting this issue. We will look into it.

Aug 15 '22 21:08 jingxu10

Please add torch._C._jit_set_profiling_mode(False) as a workaround at this time.

with torch.no_grad():
  torch._C._jit_set_profiling_mode(False)
  traced_model = torch.jit.trace(model, [input_ids, attention_mask], check_trace=False, strict=False)
  traced_model = torch.jit.freeze(traced_model)

Aug 15 '22 21:08 jingxu10