finn-base
finn-base copied to clipboard
Issue with inferring shapes in example model
If I create an onnx file with this sample script and input.txt:
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
# Define simple MLP architecture
class MLP(nn.Module):
def __init__(self):
super(MLP, self).__init__()
# Two layer MLP, ingesting a single frame of BLM data
self.layer1 = nn.Linear(259, 128)
self.layer2 = nn.Linear(128, 259*2)
def forward(self, x):
x = F.relu(self.layer1(x))
x = torch.sigmoid(self.layer2(x))
return x
# Training function
def run_inference() -> None:
# Instantiate the MLP model
model = MLP()
# Fix random seed
np.random.seed(0)
# Generate weight tensors
w1 = torch.tensor(np.random.normal(loc=0, scale=0.1, size=(128, 259)).astype(np.single))
b1 = torch.tensor(np.random.normal(loc=0, scale=0.1, size=128).astype(np.single))
w2 = torch.tensor(np.random.normal(loc=0, scale=0.1, size=(259*2, 128)).astype(np.single))
b2 = torch.tensor(np.random.normal(loc=0, scale=0.1, size=259*2).astype(np.single))
# Single inference step
with torch.no_grad():
# Load the fixed weights
model.layer1.weight = nn.parameter.Parameter(w1)
model.layer1.bias = nn.parameter.Parameter(b1)
model.layer2.weight = nn.parameter.Parameter(w2)
model.layer2.bias = nn.parameter.Parameter(b2)
# Load the input data and add a batch dimension
input_data = torch.from_numpy(np.loadtxt('input.txt', dtype=np.single)).unsqueeze(0)
# Inference
out = model(input_data)
# Save in ONNX format
torch.onnx.export(model, # model being run
input_data, # model input (or a tuple for multiple inputs)
"MLP.onnx")
if __name__ == '__main__':
run_inference()
(the produced ONNX file is available at: https://drive.google.com/file/d/1wt6ub3cChvPD-XM4-7keuTy5dC5wdVZk/view?usp=sharing)
it seems that infer_shapes
from the cleaning fails:
(fastml) mac-137349:validation jmitrevs$ qonnx-cleanup MLP.onnx
(fastml) mac-137349:validation jmitrevs$ qonnx-exec MLP_clean.onnx
Traceback (most recent call last):
File "/Users/jmitrevs/fastml/bin/qonnx-exec", line 33, in <module>
sys.exit(load_entry_point('qonnx', 'console_scripts', 'qonnx-exec')())
File "/Users/jmitrevs/work/qonnx/src/qonnx/util/exec_qonnx.py", line 43, in main
clize.run(exec_qonnx)
File "/Users/jmitrevs/fastml/lib/python3.9/site-packages/sigtools/modifiers.py", line 158, in __call__
return self.func(*args, **kwargs)
File "/Users/jmitrevs/fastml/lib/python3.9/site-packages/clize/runner.py", line 363, in run
ret = cli(*args)
File "/Users/jmitrevs/fastml/lib/python3.9/site-packages/clize/runner.py", line 220, in __call__
return func(*posargs, **kwargs)
File "/Users/jmitrevs/work/qonnx/src/qonnx/util/exec_qonnx.py", line 35, in exec_qonnx
odict = execute_onnx(model, idict)
File "/Users/jmitrevs/work/finn-base/src/finn/core/onnx_exec.py", line 147, in execute_onnx
raise Exception("Found unspecified tensor shapes, try infer_shapes")
Exception: Found unspecified tensor shapes, try infer_shapes
The problem is that model.get_tensor_shape('Gemm_0_param0')
returns []
. I do not understand the behavior.
Thanks for flagging this Jovan. I had a quick look at the testcase, and actually it looks like the problem is not from inside finn-base
but rather onnx.shape_inference.infer_shapes
which we use under the hood to do shape inference for non-custom ops. I was able to reproduce the same problem in a way that sidesteps finn-base completely:
In [1]: from onnx.shape_inference import infer_shapes
In [2]: import onnx
In [3]: ret0=onnx.load("MLP.onnx")
In [4]: ret1=infer_shapes(ret0)
In [5]: onnx.save(ret1, "mlp-with-shapes.onnx")
...and examining mlp-with-shapes.onnx in Netron I can confirm that the shapes are missing. The good news is, by upgrading to onnx==1.11.0
I was able to get the right shape inference behavior, so this must be some bug that has been fixed in recent versions.
I'll re-run the test-suite with onnx==1.11.0
and if it doesn't break anything, I'll push a fix for this to finn-base
and qonnx
repos.
It doesn't seem to solve the problem on my mac. I updated onnx versions but still have the problem:
(fastml) mac-137349:Downloads jmitrevs$ qonnx-cleanup MLP.onnx
(fastml) mac-137349:Downloads jmitrevs$ qonnx-exec MLP_clean.onnx
Traceback (most recent call last):
File "/Users/jmitrevs/fastml/bin/qonnx-exec", line 33, in <module>
sys.exit(load_entry_point('qonnx', 'console_scripts', 'qonnx-exec')())
File "/Users/jmitrevs/work/qonnx/src/qonnx/util/exec_qonnx.py", line 43, in main
clize.run(exec_qonnx)
File "/Users/jmitrevs/fastml/lib/python3.9/site-packages/sigtools/modifiers.py", line 158, in __call__
return self.func(*args, **kwargs)
File "/Users/jmitrevs/fastml/lib/python3.9/site-packages/clize/runner.py", line 363, in run
ret = cli(*args)
File "/Users/jmitrevs/fastml/lib/python3.9/site-packages/clize/runner.py", line 220, in __call__
return func(*posargs, **kwargs)
File "/Users/jmitrevs/work/qonnx/src/qonnx/util/exec_qonnx.py", line 35, in exec_qonnx
odict = execute_onnx(model, idict)
File "/Users/jmitrevs/work/finn-base/src/finn/core/onnx_exec.py", line 147, in execute_onnx
raise Exception("Found unspecified tensor shapes, try infer_shapes")
Exception: Found unspecified tensor shapes, try infer_shapes
(fastml) mac-137349:Downloads jmitrevs$ pip list | grep onnx
onnx 1.11.0
onnxconverter-common 1.8.1
onnxruntime 1.11.1
qonnx 0.0.post1.dev104+gc86147e.d20220531 /Users/jmitrevs/work/qonnx/src
tf2onnx 1.10.0 /Users/jmitrevs/work/tensorflow-onnx
I had only used Netron to check that the shapes appeared for the intermediate tensors, but if I use qonnx-exec
I actually see the same problem. The root of this seems to be as follows: even though the weight&bias tensors for the Gemm
nodes have initializers, there is no ValueInfo
generated for these tensors during shape inference. Since we rely on ValueInfo
to get shape information, the Found unspecified tensor shapes
exception is thrown during execution.
It looks like this issue has been around for a while and is related to initializers not being listed as inputs: https://github.com/onnx/onnx/issues/4102 https://github.com/onnx/onnx/issues/2874 ...but the following merged PR was supposed to fix this for 1.11.0 and later: https://github.com/onnx/onnx/pull/2901
I'm not entirely sure why the fix hasn't kicked in here. I'll have a closer look.
I haven't been able to find out why the ONNX PR#2901 does not solve this issue, so I just added a workaround in ModelWrapper
to do a fix for this while loading the model.
Since the finn-base
is scheduled to be sunset, I did this directly in a new qonnx
branch:
https://github.com/fastmachinelearning/qonnx/tree/feature/finn_base_migration
@jmitrevs could you give this a try and see if it resolves the issue for you? I was able to use qonnx-cleanup
and qonnx_exec
without errors on the MLP.onnx you shared.
I believe it fixed the problem. I am now running into another problem, but I think it's unrelated. (I will double-check this afternoon.)
I confirmed, my script now works (after fixing an unrelated bug).
@maltanar What's the fix for this issue if using the latest finn-base dev branch? (I tried building a docker with onnx>=1.11.0 but it didn't fix the issue)