keras-onnx icon indicating copy to clipboard operation
keras-onnx copied to clipboard

RandomStandardNormal cannot be converted

Open KodeWorker opened this issue 5 years ago • 3 comments

I encountered error messages when converting a tf.keras model. This conversion failed and generated an invalid *.onnx file.

from tensorflow.keras.layers import Input, Dense, Lambda, LeakyReLU
from tensorflow.keras.models import Model
import numpy as np
import keras2onnx
import tensorflow.compat.v1 as tf
tf.disable_eager_execution()

batch_size = 32
original_dim = 1000
latent_dim = 8
intermediate_dim = 256
epsilon_std = 1.0

class Sampler:
    def __init__(self, **kwargs):
        self.batch_size = kwargs.get('batch_size')
        self.latent_dim = kwargs.get('latent_dim')
        self.epsilon_std = kwargs.get('epsilon_std')

    def sampling(self, args):
        z_mean, z_log_var = args
        epsilon = tf.random_normal(shape=(self.batch_size, self.latent_dim), mean=0., stddev=self.epsilon_std)
        return z_mean + tf.exp(z_log_var / 2) * epsilon

x = Input(batch_shape=(batch_size, original_dim))
h_ = Dense(intermediate_dim)(x)
h = LeakyReLU(alpha=0.2)(h_)

z_mean = Dense(latent_dim)(h)
z_log_var = Dense(latent_dim)(h)

sampler = Sampler(batch_size=batch_size, latent_dim=latent_dim, epsilon_std=epsilon_std)
z = Lambda(sampler.sampling, output_shape=(latent_dim,))([z_mean, z_log_var])

decoder_h = Dense(intermediate_dim)
h_decoded = LeakyReLU(alpha=0.2)(decoder_h(z))

decoder_mean = Dense(original_dim)
x_decoded_mean = LeakyReLU(alpha=0.2)(decoder_mean(h_decoded))
model = Model(x, x_decoded_mean)

save_model_file = "mini.onnx"
onnx_model = keras2onnx.convert_keras(model)
keras2onnx.save_model(onnx_model, save_model_file)

The messages are shown below:

(TensorFlow2) D:\code\onnx_conversion\fan\vae>python convert_v1.py
2020-04-20 17:39:57.340114: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
WARNING:tensorflow:From C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\envs\TensorFlow2\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py:1635: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
2020-04-20 17:40:05.545980: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-04-20 17:40:06.022346: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce 940MX computeCapability: 5.0
coreClock: 0.8605GHz coreCount: 4 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 37.33GiB/s
2020-04-20 17:40:06.042278: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-20 17:40:06.063008: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-20 17:40:06.084942: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-20 17:40:06.102181: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-20 17:40:06.125014: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-20 17:40:06.143992: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-20 17:40:06.177553: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-20 17:40:06.192271: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-20 17:40:06.200685: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-04-20 17:40:06.213353: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce 940MX computeCapability: 5.0
coreClock: 0.8605GHz coreCount: 4 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 37.33GiB/s
2020-04-20 17:40:06.233324: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-20 17:40:06.245095: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-20 17:40:06.256005: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-20 17:40:06.267765: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-20 17:40:06.279495: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-20 17:40:06.289576: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-20 17:40:06.300944: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-20 17:40:06.312736: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-20 17:40:08.926805: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-20 17:40:08.937186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0
2020-04-20 17:40:08.945151: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N
2020-04-20 17:40:08.952519: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1373 MB memory) -> physical GPU (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0, compute capability: 5.0)
tf executing eager_mode: False
tf.keras model eager_mode: False
The tf.op node lambda/random_normal/RandomStandardNormal of type RandomStandardNormal cannot be converted
There is an error(<class 'AssertionError'>) happened during optimizing on the converted model!

Traceback (most recent call last):
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\envs\TensorFlow2\lib\site-packages\keras2onnx-1.7.0-py3.6.egg\keras2onnx\topology.py", line 317, in convert_topology
    target_opset=container.target_opset)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\envs\TensorFlow2\lib\site-packages\onnxconverter_common-1.7.0-py3.6.egg\onnxconverter_common\optimizer.py", line 1621, in optimize_onnx_graph
    initializers)
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\envs\TensorFlow2\lib\site-packages\onnxconverter_common-1.7.0-py3.6.egg\onnxconverter_common\optimizer.py", line 302, in build_from_onnx
    assert var_ == '' or var_ in inputs
AssertionError

The maximum opset needed by this model is only 9.

KodeWorker avatar Apr 20 '20 10:04 KodeWorker

The op conversion impl missed in the source code, can I borrow some pieces of your code for the unit testing?

wenbingl avatar Apr 20 '20 18:04 wenbingl

@wenbingl Thank you for the reply. Sure, you are welcome to use the code.

KodeWorker avatar Apr 21 '20 01:04 KodeWorker

Sorry let you know later that the latest code now support RandomStandardNormal. But these Random Ops actually generate the different results among the different inference runtime, even the seed is the same.

wenbingl avatar Apr 24 '20 16:04 wenbingl