onnx-tensorflow icon indicating copy to clipboard operation
onnx-tensorflow copied to clipboard

In RNN or LSTM, input_size =1 and /or hidden_size = 1 fails

Open londumas opened this issue 4 years ago • 1 comments

Apparently trying to set input_size =1 and /or hidden_size = 1 in RNNs or LSTMs fails. Here is a minimal code and output

import numpy as np

import torch
import onnxruntime
import onnx
import onnx_tf
import tensorflow as tf

batch_size = 1
seq_length = 1
input_size = 1 #- can not be 1...
hidden_size = 1 #- can not be 1...

path_to_save = 'model'

def to_numpy(tensor):
    return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

class Net(torch.nn.Module):

    def __init__(self, input_size, hidden_size):
        super(Net, self).__init__()

        self.rnn = torch.nn.RNN(input_size=input_size, hidden_size=hidden_size)
        
    def forward(self, x):
        
        x, _ = self.rnn(x)

        return x

x = np.random.randn(seq_length, batch_size, input_size).astype(np.float32)

model = Net(input_size=input_size, hidden_size=hidden_size)

y_torch = to_numpy(model( torch.FloatTensor(x) ))

torch.save(model, "{}.pth".format(path_to_save))

model = torch.load("{}.pth".format(path_to_save))

torch.onnx.export(model,
    torch.FloatTensor(x),
    "{}.onnx".format(path_to_save),
    input_names = ['input'],
    output_names = ['output'],
)

ort_session = onnxruntime.InferenceSession("{}.onnx".format(path_to_save))

ort_inputs = {
    ort_session.get_inputs()[0].name: x,
}
y = ort_session.run(None, ort_inputs)[0]

print("Maximum difference: ", np.absolute(y-y_torch).max(), ( (y-y_torch)**2 ).max() )

model = onnx.load('{}.onnx'.format(path_to_save))

model = onnx_tf.backend.prepare(model)

model.export_graph('{}.pb'.format(path_to_save))

model =  tf.saved_model.load("{}.pb/".format(path_to_save))

y = model(input=x)[0].numpy()

print("Maximum difference: ", np.absolute(y-y_torch).max(), ( (y-y_torch)**2 ).max() )

Here is the output

2021-07-19 17:21:32.968331: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-07-19 17:21:32.968425: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-07-19 17:21:34.586576: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2021-07-19 17:21:34.609232: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:04:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.582GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
2021-07-19 17:21:34.611012: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 1 with properties: 
pciBusID: 0000:87:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.582GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
2021-07-19 17:21:34.612330: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 2 with properties: 
pciBusID: 0000:88:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.582GHz coreCount: 28 deviceMemorySize: 10.92GiB deviceMemoryBandwidth: 451.17GiB/s
2021-07-19 17:21:34.612474: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-07-19 17:21:34.614871: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-07-19 17:21:34.617089: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-07-19 17:21:34.617479: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2021-07-19 17:21:34.620014: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2021-07-19 17:21:34.621334: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2021-07-19 17:21:34.621469: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2021-07-19 17:21:34.621492: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-07-19 17:21:34.621861: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-07-19 17:21:34.631990: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2394345000 Hz
2021-07-19 17:21:34.633780: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x829b080 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-07-19 17:21:34.633812: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-07-19 17:21:34.635488: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-19 17:21:34.635516: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      
/home/<me>/.local/lib/python3.7/site-packages/torch/onnx/symbolic_opset9.py:2099: UserWarning: Exporting a model to ONNX with a batch_size other than 1, with a variable length with RNN_TANH can cause an error when running the ONNX model with a different batch size. Make sure to save the model with a batch size of 1, or define the initial states (h0/c0) as inputs of the model. 
  "or define the initial states (h0/c0) as inputs of the model. ")
WARNING:tensorflow:From /home/<me>/.local/lib/python3.7/site-packages/onnx_tf/handlers/backend/rnn_mixin.py:35: BasicRNNCell.__init__ (from tensorflow.python.keras.layers.legacy_rnn.rnn_cell_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This class is equivalent as tf.keras.layers.SimpleRNNCell, and will be replaced by that in Tensorflow 2.0.
WARNING:tensorflow:From /home/<me>/.local/lib/python3.7/site-packages/onnx_tf/handlers/backend/rnn_mixin.py:37: MultiRNNCell.__init__ (from tensorflow.python.keras.layers.legacy_rnn.rnn_cell_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.
WARNING:tensorflow:From /home/<me>/.local/lib/python3.7/site-packages/onnx_tf/handlers/backend/rnn_mixin.py:45: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `keras.layers.RNN(cell)`, which is equivalent to this API
WARNING:tensorflow:From /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/keras/layers/legacy_rnn/rnn_cell_impl.py:456: Layer.add_variable (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.add_weight` method instead.
Maximum difference:  2.9802322e-08 8.881784e-16
Traceback (most recent call last):
  File "input-output-not-1.py", line 62, in <module>
    model.export_graph('{}.pb'.format(path_to_save))
  File "/home/<me>/.local/lib/python3.7/site-packages/onnx_tf/backend_rep.py", line 116, in export_graph
    **self.signatures))
  File "/home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 1167, in get_concrete_function
    concrete = self._get_concrete_function_garbage_collected(*args, **kwargs)
  File "/home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 1073, in _get_concrete_function_garbage_collected
    self._initialize(args, kwargs, add_initializers_to=initializers)
  File "/home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 697, in _initialize
    *args, **kwds))
  File "/home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2855, in _get_concrete_function_internal_garbage_collected
    graph_function, _, _ = self._maybe_define_function(args, kwargs)
  File "/home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3213, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3075, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 986, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 600, in wrapped_fn
    return weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3735, in bound_method_wrapper
    return wrapped_fn(*args, **kwargs)
  File "/home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 973, in wrapper
    raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:

    /home/<me>/.local/lib/python3.7/site-packages/onnx_tf/backend_tf_module.py:98 __call__  *
        output_ops = self.backend._onnx_node_to_tensorflow_op(onnx_node,
    /home/<me>/.local/lib/python3.7/site-packages/onnx_tf/backend.py:328 _onnx_node_to_tensorflow_op  *
        return handler.handle(node, tensor_dict=tensor_dict, strict=strict)
    /home/<me>/.local/lib/python3.7/site-packages/onnx_tf/handlers/handler.py:59 handle  *
        return ver_handle(node, **kwargs)
    /home/<me>/.local/lib/python3.7/site-packages/onnx_tf/handlers/backend/rnn.py:190 version_7  *
        return cls._common(node, **kwargs)
    /home/<me>/.local/lib/python3.7/site-packages/onnx_tf/handlers/backend/rnn.py:162 _common  *
        outputs, states = cls.rnn(x, tf.compat.v1.nn.rnn_cell.BasicRNNCell,
    /home/<me>/.local/lib/python3.7/site-packages/onnx_tf/handlers/backend/rnn_mixin.py:45 rnn  *
        outputs, states = tf.compat.v1.nn.dynamic_rnn(cell_fw, x, **rnn_kwargs)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py:324 new_func  **
        return func(*args, **kwargs)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py:201 wrapper
        return target(*args, **kwargs)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/ops/rnn.py:691 dynamic_rnn
        dtype=dtype)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/ops/rnn.py:894 _dynamic_rnn_loop
        swap_memory=swap_memory)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/ops/control_flow_ops.py:2696 while_loop
        back_prop=back_prop)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/ops/while_v2.py:196 while_loop
        add_control_dependencies=add_control_dependencies)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py:986 func_graph_from_py_func
        func_outputs = python_func(*func_args, **func_kwargs)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/ops/while_v2.py:174 wrapped_body
        outputs = body(*_pack_sequence_as(orig_loop_vars, args))
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/ops/rnn.py:865 _time_step
        (output, new_state) = call_cell()
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/ops/rnn.py:851 <lambda>
        call_cell = lambda: cell(input_t, state)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/keras/layers/legacy_rnn/rnn_cell_impl.py:244 __call__
        return super(RNNCell, self).__call__(inputs, state)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/keras/legacy_tf_layers/base.py:547 __call__
        outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:985 __call__
        outputs = call_fn(inputs, *args, **kwargs)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/keras/layers/legacy_rnn/rnn_cell_impl.py:1320 call
        cur_inp, new_state = cell(cur_inp, cur_state)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/keras/layers/legacy_rnn/rnn_cell_impl.py:386 __call__
        self, inputs, state, scope=scope, *args, **kwargs)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/keras/legacy_tf_layers/base.py:547 __call__
        outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:982 __call__
        self._maybe_build(inputs)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:2643 _maybe_build
        self.build(input_shapes)  # pylint:disable=not-callable
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/keras/utils/tf_utils.py:323 wrapper
        output_shape = fn(instance, input_shape)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/keras/layers/legacy_rnn/rnn_cell_impl.py:456 build
        shape=[input_depth + self._num_units, self._num_units])
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py:324 new_func
        return func(*args, **kwargs)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:2233 add_variable
        return self.add_weight(*args, **kwargs)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/keras/legacy_tf_layers/base.py:460 add_weight
        **kwargs)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:614 add_weight
        caching_device=caching_device)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py:750 _add_variable_with_custom_getter
        **kwargs_for_getter)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py:1572 get_variable
        aggregation=aggregation)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py:1315 get_variable
        aggregation=aggregation)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py:552 get_variable
        return custom_getter(**custom_getter_kwargs)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/ops/variable_scope.py:2028 wrapped_custom_getter
        return custom_getter(functools.partial(old_getter, getter), *args, **kwargs)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/keras/layers/legacy_rnn/rnn_cell_impl.py:247 _rnn_get_variable
        variable = getter(*args, **kwargs)
    /home/<me>/.local/lib/python3.7/site-packages/onnx_tf/handlers/backend/rnn.py:78 _custom_getter
        weight_var.assign(tf.concat([new_w, new_r], 0))
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py:201 wrapper
        return target(*args, **kwargs)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py:1654 concat
        return gen_array_ops.concat_v2(values=values, axis=axis, name=name)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/ops/gen_array_ops.py:1222 concat_v2
        "ConcatV2", values=values, axis=axis, name=name)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:744 _apply_op_helper
        attrs=attr_protos, op_def=op_def)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py:593 _create_op_internal
        compute_device)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:3485 _create_op_internal
        op_def=op_def)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:1975 __init__
        control_input_ops, op_def)
    /home/<me>/.local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:1815 _create_c_op
        raise ValueError(str(e))

    ValueError: Can't concatenate scalars (use tf.stack instead) for '{{node concat}} = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32](transpose, transpose_1, concat/axis)' with input shapes: [], [], [].


londumas avatar Jul 19 '21 15:07 londumas

how do you solve this problem ?

YuriSizuku avatar Jan 24 '24 06:01 YuriSizuku