coremltools
coremltools copied to clipboard
coremltools does not export LSTM state?
❓Question
I am trying to convert an LSTM based model from tensorflow to CoreML to be used in a MacOS application. Despite of looking at hundreds of examples in documentation, and spending two days through all possible references to a similar problem, I can't find a solution. Here is a compact example that replicates the problem.
First we define a very simple LSTM model and export it to CoreML using coremltools:
import os
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow import keras
import coremltools
model = keras.Sequential()
model.add(layers.Input(dtype="float32", batch_input_shape=(1,1,1)))
model.add(layers.LSTM(3,))
model.add(layers.Dense(1))
model.summary()
coreml_model_file = './test.mlpackage'
mlmodel = coremltools.convert(model,source="tensorflow")
mlmodel.save(coreml_model_file)
tensorflow version is 2.13.0, coremltools version is 6.3.0, python version 3.11. Tried with tensorflow 2.12.0, does not change a thing.
Dragging .mlpackage to Xcode performs perfect import. But I get the following as model specification: SCREENSHOT OF MY MODEL
Documentation (https://developer.apple.com/documentation/coreml/making_predictions_with_a_sequence_of_inputs?language=objc) suggests that the states of the LSTM layer should be exported as additional inputs and outputs, so the network can be applied to a new arbitrary sequence. Whatever I do I cannot get statein/stateout to appear in CoreML model.
What am I doing wrong? How do I export an LSTM model properly?
Tried different versions of tensorflow 2, coremltools. Stateful LSTM does not export at all. Changing input shape to (1, None, 1) does not help either.
Will really appreciate some help.
@vyshemirsky Coreml conversion will have a parity of your original LSTM model. i.e., if your original model is 1 input and 1 output, so do the converter coreml model.
I ran. the following code in your example:
import os
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow import keras
import coremltools
model = keras.Sequential()
model.add(layers.Input(dtype="float32", batch_input_shape=(1,1,1)))
model.add(layers.LSTM(3,))
model.add(layers.Dense(1))
model.summary()
print(model.inputs)
print(model.outputs)
and I see the i/o for the keras model is 1 input and 1 output:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (1, 3) 60
dense (Dense) (1, 1) 4
=================================================================
Total params: 64
Trainable params: 64
Non-trainable params: 0
_________________________________________________________________
[<KerasTensor: shape=(1, 1, 1) dtype=float32 (created by layer 'input_1')>]
[<KerasTensor: shape=(1, 1) dtype=float32 (created by layer 'dense')>]
So I think what you are observing is a correct bahavior.