stanza
stanza copied to clipboard
[QUESTION] Export to onnx format
I try to convert the pre-trained models into onnx format. I use explanation of how to do it from https://pytorch.org/tutorials/advanced/super_resolution_with_onnxruntime.html
I created a fork from stanza for this experiment here https://github.com/vivkvv/stanza. See also my commits https://github.com/vivkvv/stanza/commits?author=vivkvv.
I used pipeline_demo.py for testing. The main thing I added is code just inside models/tokanization/trainer.py
below line 77
pred = self.model(units, features)
Due to the explanation, I added
torch.onnx.export(
self.model,
(units, features),
onnx_export_file_name,
opset_version=9,
export_params=True,
do_constant_folding=True,
input_names=['input'],
output_names=['output'],
dynamic_axes={
'input': {0: 'batch_size'},
'output': {0: 'batch_size'}
}
)
and it works for tokenization. But the same does not work for e.g. pos or lemmatizer. I added code inside models/pos/trainer.py
in method predict
below the lines
self.model.eval()
batch_size = word.size(0)
the similar code:
torch.onnx.export(
self.model,
(word, word_mask, wordchars, wordchars_mask, upos, xpos, ufeats, pretrained, word_orig_idx, sentlens, wordlens),
onnx_export_file_name,
opset_version=9,
export_params=True,
do_constant_folding=True,
input_names=['input'],
output_names=['output'],
dynamic_axes={
'input': {0: 'batch_size'},
'output': {0: 'batch_size'}
}
)
In this case {python=3.7.11, stanza=1.2.3, pytorch=1.3.1, onnx=1.10.2, onnxruntime=1.9.0}
, I get an error
RuntimeError: Only tuples, lists and Variables supported as JIT inputs/outputs. Dictionaries and strings are also accepted but their usage is not recommended. But got unsupported type int
This error appears on \lib\site-packages\torch\jit_init_.py, line 329:
def forward(self, *args):
in_vars, in_desc = _flatten(args)
As far as I understand it happens because the types of word_orig_idx, sentlens, wordlens are 'list' and not 'torch.Tensor'.
Then I downloaded stanza from github and continued experiments with this downloaded version. In this case {python=3.7.12, stanza=1.3.0, pytorch=1.10.0, onnx=1.10.2, onnxruntime=1.9.0}
I get the errors:
0-6, >14: Unsupported ONNX version 7: ONNX export failed on expand, which is not implemented for opset 7. Try exporting with other opset versions. 8-10: Failed to export an ONNX attribute 'prim::Param', since it's not constant, please try to make things (e.g., kernel size) static if possible 11-14: Exporting the operator pad_sequence to ONNX opset version 11(12, 13, 14) is not supported. Please feel free to request support or submit a pull request on PyTorch GitHub.
So, do you have any recommendations on how to convert pre-trained PyTorch models into onnx format? Maybe, it would be useful to add such kind of methods to Trainer
classes?
That is an interesting idea, potentially a useful one, but I don't know anything about ONNX right now and I'm not sure anyone else in the group will want to work on that. Probably won't happen for a little while on that account. Maybe early next year sometime?
Allow me to ask you, your error occurs when you compile which file? Thank you very much
stanza/models/pos/trainer.py see commit https://github.com/vivkvv/stanza/commit/d30d396950c94499aa4897e2a3539ec720682253#diff-c71256c543b11dc87ee5f934d8f0b8f38fbbdcd703597b4afd6047be662610b6 it happens on line 80: torch.onnx.export(...
Did anybody figure this thing out? Thank you !
I'm also interested in this.
i also have the same demand.