onnx-tensorflow
onnx-tensorflow copied to clipboard
Cannot export multilayer LSTM to tensorflow
Describe the bug
I need to convert my PyTorch model to Tensorflow. For this purpose I use PyTorch —> ONNX —> Tensorflow approach.
However, I get the following error message when trying to run a prepared tensorflow model:
ValueError: Dimensions must be equal, but are 6 and 7 for 'rnn/multi_rnn_cell/cell_0/lstm_cell/MatMul' (op: 'MatMul') with input shapes: [50,6], [7,12].
The problem appears when the initial PyTorch LSTM module has more than one layer:
num_layers = 2
self.lstm = nn.LSTM(input_size=4, hidden_size=hidden_num, num_layers=num_layers, batch_first=True)
If there is only one layer, everything works fine.
To Reproduce
Here's the full code:
import torch
import torch.nn as nn
import onnx
import tensorflow as tf
import torch.onnx
from onnx_tf.backend import prepare
import numpy as np
class MyLSTM(nn.Module):
def __init__(self, hidden_num, num_layers):
super(MyLSTM, self).__init__()
self.linear = nn.Linear(200, 4)
self.lstm = nn.LSTM(input_size=4, hidden_size=hidden_num,
num_layers=num_layers, batch_first=True)
def forward(self, x, h0c0):
out = x.view(x.shape[0], 1, -1)
out = self.linear(out)
out, _ = self.lstm(out.view(x.shape[0], -1, 4), h0c0)
out = out.view(x.shape[0], -1)
return out
batch_size = 50
hidden_num = 3
num_layers = 2
model = MyLSTM(hidden_num, num_layers)
# test pytorch model
inputs = torch.zeros(batch_size, 10, 20)
h0 = torch.zeros(num_layers, batch_size, hidden_num)
c0 = torch.zeros(num_layers, batch_size, hidden_num)
out = model(inputs, (h0, c0))
print(out)
# export from pytorch to ONNX
onnx_path = "./lstm.onnx"
torch.onnx.export(model, (inputs, (h0, c0)), onnx_path,
dynamic_axes={'input': {0: 'batch'},
'h0': {1: 'batch'}, 'c0': {1: 'batch'},
'output': {0: 'batch'}},
input_names=['input', 'h0', 'c0'], output_names=['output'])
# load ONNX model and create tensorflow representation
onnx_model = onnx.load(onnx_path)
tf_rep = prepare(onnx_model, device='cpu')
# run tensorflow model
inputs = (np.zeros((batch_size, 10, 20), dtype=np.float32),
np.zeros((num_layers, batch_size, hidden_num), dtype=np.float32),
np.zeros((num_layers, batch_size, hidden_num), dtype=np.float32))
result = tf_rep.run(inputs) # RUN-TIME ERROR HERE
print(result)
ONNX model file
https://drive.google.com/file/d/1kaCryP-My7_I4Gd2BS37uKUzYB6MkHaG/view?usp=sharing
Python, ONNX, ONNX-TF, Tensorflow version
- Python version: 2.7.18 |Anaconda, Inc.| (default, Apr 23 2020, 17:30:41) [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
- ONNX version: 1.7.0
- ONNX-TF version: 1.6.0
- Tensorflow version: 2.1.0
(this is the only configuration I managed to export with)
Additional context
As I understand, such problems usually arise when the same cells are used for different LSTM layers instead of creating new ones: https://stackoverflow.com/a/48796202
Moreover, the problem doesn't disappear if I split multilayer LSTM into several one-layer LSTMs.
So, I believe that it is connected to the lines 34–37 in rnn_mixing.py
: https://github.com/onnx/onnx-tensorflow/blob/c63d4351c7752a769cdc9a1bfcf79ffd140e0e6a/onnx_tf/handlers/backend/rnn_mixin.py#L34-L37
I also have to note that the problem concerns the onnx-tensorflow
module because conversion from PyTorch to ONNX and backwards (using onnxruntime
) works as expected.
@chist fall for the same problem. did you figure out a solution?
I can convert one-layer LSTM and multilayer LSTMs to tensorflow (onnx 1.7.0, onnx-tf 1.7.0 ). However, the results between onnx and onnx-tf are not equal when using multi-layer LSTM, while the results between onnx and onnx-tf when using single-layer LSTM are the same.
@chist have you figured out a solution?
I probably faced the same error when was trying to load several instances of LSTM model in one session I believe that the problem is that rnn_cell is a global class variable
https://github.com/onnx/onnx-tensorflow/blob/c63d4351c7752a769cdc9a1bfcf79ffd140e0e6a/onnx_tf/handlers/backend/rnn_mixin.py#L28
And when you create several instances of LSTM, rnn_cell is getting initialized only once
I suspect this is not an expected behavior. @chinhuang007 Can you take a look please?
The bug is still happening. I cannot run the exported model with multilayer lstm.
- Python 3.7 / 3.8
- ONNX-TF 1.9.0
- ONNX 1.10.1
- Tensorflow 2.6.0
Error:
InvalidArgumentError: Matrix size-incompatible: In[0]: [219,512], In[1]: [336,1024]
[[{{node LSTM_2ab5f684/rnn/while/body/_58/LSTM_ad6f857e/rnn/while/rnn/multi_rnn_cell/cell_0/lstm_cell/BiasAdd}}]] [Op:__inference___call___2981]
Function call stack:
__call__
We can reproduce this bug in a colab environment (both GPU and CPU), here is the reproduce code:
!pip install onnx-tf
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable
import onnx
import onnx_tf
from onnx_tf.backend import prepare
import numpy as np
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
G = nn.LSTM(input_size=80,
hidden_size=256,
num_layers=3, # bug at 3
dropout=0,
bidirectional=False,
batch_first=True)
G = G.to(device)
aus = torch.Tensor(np.zeros((219,18,80))).to(device)
torch.onnx.export(G,
args=(aus),
f="audio_G.onnx",
input_names=["au"],
output_names=["a","b","c"],
opset_version=12)
model = onnx.load('audio_G.onnx')
tf_rep = prepare(model)
o,w,e = tf_rep.run((aus.cpu()))