transformer-deploy icon indicating copy to clipboard operation
transformer-deploy copied to clipboard

Optimising ONNX Graph either takes too long or doesn't seem to work

Open accountForIssues opened this issue 2 years ago • 4 comments

Using the GPT2 Notebook, I am trying to convert a gpt2 model to an optimised ONNX graph and I'm stuck at what seems to be random behaviour.

The export to ONNX works fine. However, while optimising the ONNX graph, I usually see warnings similar to this: WARNING:symbolic_shape_infer:Cannot determine if Reshape_560_o0__d1 - sequence < 0 over and over again until I have to stop the kernel.

It did work once or twice (in the same environment) and it took about 30 seconds so I have no idea what changed.

I barely even changed the code. I'm just following the notebook.

What does the warning mean and how can I go back to a stable optimisation ?

accountForIssues avatar Jul 04 '22 10:07 accountForIssues

Which version of PyTorch are you using ?

pommedeterresautee avatar Jul 04 '22 20:07 pommedeterresautee

I've tried with both 1.11 and 1.12.

Is there a recommended way or guide to use this library along with setting up an environment ? Maybe there is a package conflict somewhere I'm overlooking.

accountForIssues avatar Jul 04 '22 21:07 accountForIssues

I just rerun the notebook and had no issue. I imagine it's a dependency version thing. The ones I would check would be those related to ONNX and Pytorch as they are the only 2 things related to ONNX graph.

❯ pip list | grep onnx
onnx                      1.12.0
onnx-graphsurgeon         0.3.19
onnxconverter-common      1.9.0
onnxruntime-gpu           1.12.0
onnxruntime-tools         1.7.0
tf2onnx                   1.11.1
❯ pip list | grep torch
pytorch-quantization      2.1.2
torch                     1.11.0+cu113

pommedeterresautee avatar Jul 05 '22 19:07 pommedeterresautee

Maybe solved.

I created a new docker image using the latest cuda runtime and installed each package.

I can confirm that the latest torch causes this issue but I do remember when I used an older torch image that I got this error as well so I definitely think there's another package that could be causing this issue.

In any case, I will keep testing to see out if it breaks again. Hopefully, you come across this as well when you update the docker image and solve it :)

accountForIssues avatar Jul 06 '22 12:07 accountForIssues