ERROR:tf2onnx.tfonnx:Tensorflow op [sequential_1/bidirectional_1/forward_lstm_1/CudnnRNNV3: CudnnRNNV3] is not supported
Open this issue following the discussion from https://github.com/keras-team/keras/issues/21533#issuecomment-3455553930
Export the following model using:
keras==3.12.0 tf2onnx-1.16.1 onnx-1.19.1 protobuf-3.20.3 tensorflow==2.19.0
model = Sequential()
model.add(Input(shape=(sam_sz, num_features_in))) # Use calculated input shape
for _ in range(conv_lay):
model.add(Conv1D(filters=filter, kernel_size=ker_size, activation=act, padding='same', data_format='channels_last'))
model.add(MaxPooling1D(pool_size=pool_size, data_format='channels_last'))
for _ in range(lstm_lay):
# For intermediate LSTM layers, return sequences
model.add(Bidirectional(LSTM(100, return_sequences = True, kernel_regularizer=l2(0.0001)))) # Use Bidirectional LSTM
model.add(Dropout(0.3))
model.add(Bidirectional(LSTM(100, return_sequences = False, kernel_regularizer=l2(0.0001)))) # Use Bidirectional LSTM
model.add(Dropout(0.3))
model.add(Dense(units=3, activation = act2, kernel_regularizer=l2(0.0001)))
with:
model.export(output_path, format="onnx")
observed the following issues:
WARNING:tf2onnx.shape_inference:Cannot infer shape for sequential_1/bidirectional_1/forward_lstm_1/CudnnRNNV3: sequential_1/bidirectional_1/forward_lstm_1/CudnnRNNV3:3,sequential_1/bidirectional_1/forward_lstm_1/CudnnRNNV3:4
WARNING:tf2onnx.shape_inference:Cannot infer shape for sequential_1/bidirectional_1/backward_lstm_1/CudnnRNNV3: sequential_1/bidirectional_1/backward_lstm_1/CudnnRNNV3:3,sequential_1/bidirectional_1/backward_lstm_1/CudnnRNNV3:4
ERROR:tf2onnx.tfonnx:Tensorflow op [sequential_1/bidirectional_1/forward_lstm_1/CudnnRNNV3: CudnnRNNV3] is not supported
ERROR:tf2onnx.tfonnx:Tensorflow op [sequential_1/bidirectional_1/backward_lstm_1/CudnnRNNV3: CudnnRNNV3] is not supported
ERROR:tf2onnx.tfonnx:Unsupported ops: Counter({'CudnnRNNV3': 2})
Would appreciate your attention.
Hi @josephgiting, thanks for reporting this. I've tested your code with Keras 3.12.0 and I'm not able to reproduce the error you've mentioned, please refer this gist, let me know if I missed anything.
@dhantule Many thanks for looking into this issue so quickly. Using your link, I rerun it on Google Colab with T4 GPU runtime type, and you can observe the issue there (and with pip freeze info.): https://colab.research.google.com/gist/josephgiting/bde42a3999e9cd764984b5aec156a41e/-21799.ipynb
NOTE: This issue is not reproducible with CPU runtime type.
Please let me know if I can assist further. Kind Regards,
Hi @josephgiting, thanks for letting me know, we'll look into this.
To me this looks like a spot where tf2onnx does not support the required ops we'd need for rnn model export. I am not sure if that is something we could fix on the Keras side, but it would be great to add support for in tf2onnx.
Here's a relevant issues -> https://github.com/onnx/tensorflow-onnx/issues/2359
You might be able to skirt around the issue by passing use_cudnn=False to your RNN layers for now. Long term we'd probably want to add support in tf2onnx unless there's a particular reason not too.
Also tagging @james77777778 who added onnx support, in case he has any ideas how we could work around this during export so things work while we are missing coverage for these ops.
Thanks @mattdangerw
I have updated my gist with use_cudnn=False and it works.
Loading the the ONNX model with onnxruntime is good too.
Thanks! I think we can leave this open as the bug is still valid. But just to track while we wait for support from tfonnx.
Experiments show that setting use_cudnn=False significantly increases training time, which likely indicates that CUDA (GPU acceleration) is not being utilized — effectively the same as running on CPU. Hence not seeing this issue.
If that’s the case, the relevant team may need to investigate and fix this issue.