keras
keras copied to clipboard
model.fit() bug when using a zipped Dataset as input for a multiple-input model
("Cross-post" of https://github.com/tensorflow/tensorflow/issues/54271)
System information.
- Have I written custom code (as opposed to using a stock example script provided in Keras): yes
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
- TensorFlow installed from (source or binary): pip
- TensorFlow version (use command below): 2.9.0.dev20220202
- Python version: 3.10.2
Describe the problem.
With a multi-input model, feding a dataset which returns a tuple of multiple elements to the tf.keras.model.fit() method, instead of using the whole tuple as input (then distributing each element to each input), the first element in the tuple is used as the input for the whole model.
Describe the current behavior
I have a custom model which takes 3 images as input
I have 3 separate (currently unbatched as I debug this error) datasets, classes encoded as categorical, meaning each input tensor has shape ((x, y, z), (c,))
Trying to input the 3 datasets separately fails, either by inputting them as a dict mapping each ds to a named input {"Input1": ds1, "Input2": ds2, "Input3": ds3}
, or using a list [ds, ds2, ds3]
.
I zip the three datasets. Testing the resulting dataset with (using the docs as guidance):
for element in zipped_ds.as_numpy_iterator():
print("element", element)
Outputs:
element [[[x1, y1, z1], [c1,]], [[x2, y2, z2], [c2,]], [[x3, y3, z3], [c3,]]]
element [[[x1, y1, z1], [c1,]], [[x2, y2, z2], [c2,]], [[x3, y3, z3], [c3,]]]
...
Seems to work, right? Every call to the iterator returns 3 elements. Well, when I use the zipped dataset as input of model_fit(), the first element in the tuple returned by the dataset object is treated as the input for the whole model, meaning that instead of using [[[x1, y1, z1], [c1,]], [[x2, y2, z2], [c2,]], [[x3, y3, z3], [c3,]]] as the input to the model, it uses [[x1, y1, z1], [c1,]], and the training fails.
I've tried many approaches, like using zipped_ds.as_numpy_iterator()
or ([ds1, ds2, ds3] for idx, (ds1, ds2, ds3) in enumerate(zipped_ds))
, but both fail as the returned item is empty
Standalone code to reproduce the issue
# %%
import os
import tensorflow as tf # tensorflow nightly, version>=2.5
from tensorflow import keras
from tensorflow.image import crop_to_bounding_box as tfimgcrop
from tensorflow.keras.preprocessing import image_dataset_from_directory
BATCH_SIZE=32 # Adjust?
IMG_SIZE=(224, 224)
IMG_SHAPE = IMG_SIZE + (3,)
# %%
_URL = 'https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip'
path_to_zip = tf.keras.utils.get_file('cats_and_dogs.zip', origin=_URL, extract=True)
PATH = os.path.join(os.path.dirname(path_to_zip), 'cats_and_dogs_filtered')
train_dir = os.path.join(PATH, 'train')
validation_dir = os.path.join(PATH, 'validation')
train_dataset = tf.keras.preprocessing.image_dataset_from_directory(train_dir,
shuffle=False,
label_mode='categorical',
batch_size=32,
image_size=IMG_SIZE)
validation_dataset = tf.keras.preprocessing.image_dataset_from_directory(validation_dir,
shuffle=False,
label_mode='categorical',
batch_size=32,
image_size=IMG_SIZE)
# %%
base_model1 = tf.keras.applications.MobileNetV3Large(input_shape=(64, 64, 3),
include_top=False,
weights='imagenet',
minimalistic=False,
pooling=max,
dropout_rate=0.2)
base_model2 = tf.keras.applications.MobileNetV3Large(input_shape=(64, 64, 3),
include_top=False,
weights='imagenet',
minimalistic=False,
pooling=max,
dropout_rate=0.2)
base_model3 = tf.keras.applications.MobileNetV3Large(input_shape=(64, 64, 3),
include_top=False,
weights='imagenet',
minimalistic=False,
pooling=max,
dropout_rate=0.2)
# %%
pre_concat_layer1 = tf.keras.layers.Dense(64,
activation='relu',
kernel_initializer='random_uniform',
bias_initializer='zeros')
pre_concat_layer2 = tf.keras.layers.Dense(64,
activation='relu',
kernel_initializer='random_uniform',
bias_initializer='zeros')
pre_concat_layer3 = tf.keras.layers.Dense(64,
activation='relu',
kernel_initializer='random_uniform',
bias_initializer='zeros')
post_concat_layer = tf.keras.layers.Dense(128,
activation='relu',
kernel_initializer='random_uniform',
bias_initializer='zeros')
prediction_layer = tf.keras.layers.Dense(2,
activation='softmax',
kernel_initializer='random_uniform',
bias_initializer='zeros')
# %%
input1 = tf.keras.Input(shape=(64, 64, 3), name="First")
input2 = tf.keras.Input(shape=(64, 64, 3), name="Second")
input3 = tf.keras.Input(shape=(64, 64, 3), name="Third")
x = base_model1(input1, training=False)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dropout(0.2)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = pre_concat_layer1(x)
x = tf.keras.layers.Dropout(0.2)(x)
outputs = tf.keras.layers.BatchNormalization()(x)
body1 = tf.keras.Model(input1, outputs)
x = base_model2(input2, training=False)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dropout(0.2)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = pre_concat_layer2(x)
x = tf.keras.layers.Dropout(0.2)(x)
outputs = tf.keras.layers.BatchNormalization()(x)
body2 = tf.keras.Model(input2, outputs)
x = base_model3(input3, training=False)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dropout(0.2)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = pre_concat_layer3(x)
x = tf.keras.layers.Dropout(0.2)(x)
outputs = tf.keras.layers.BatchNormalization()(x)
body3 = tf.keras.Model(input3, outputs)
# %%
body1.get_layer("MobilenetV3large")._name = "MobilenetV3large1"
body2.get_layer("MobilenetV3large")._name = "MobilenetV3large2"
body3.get_layer("MobilenetV3large")._name = "MobilenetV3large3"
# %%
combinedInput = tf.keras.layers.concatenate([body1.output, body2.output, body3.output])
x = post_concat_layer(combinedInput)
x = tf.keras.layers.Dropout(0.2)(x)
x = tf.keras.layers.BatchNormalization()(x)
foutput = prediction_layer(x)
final_model = tf.keras.Model(inputs=[body1.input, body2.input, body3.input], outputs=foutput)
# %%
def resize_data1(images, classes):
return (tfimgcrop(images,
offset_height=0,
offset_width=0,
target_height=64,
target_width=64),
classes)
def resize_data2(images, classes):
return (tfimgcrop(images,
offset_height=0,
offset_width=64,
target_height=64,
target_width=64),
classes)
def resize_data3(images, classes):
return (tfimgcrop(images,
offset_height=0,
offset_width=128,
target_height=64,
target_width=64),
classes)
# %%
train_dataset_unb = train_dataset.unbatch()
train_dataset1 = train_dataset_unb.map(resize_data1)
train_dataset2 = train_dataset_unb.map(resize_data2)
train_dataset3 = train_dataset_unb.map(resize_data3)
train_dataset_zip = tf.data.Dataset.zip((train_dataset1, train_dataset2, train_dataset3))
validation_dataset_unb = validation_dataset.unbatch()
validation_dataset1 = validation_dataset_unb.map(resize_data1)
validation_dataset2 = validation_dataset_unb.map(resize_data2)
validation_dataset3 = validation_dataset_unb.map(resize_data3)
validation_dataset_zip = tf.data.Dataset.zip((validation_dataset1, validation_dataset2, validation_dataset3))
# %%
final_model.compile()
# %%
history = final_model.fit(train_dataset_zip,
epochs=999,
validation_data=validation_dataset_zip,
validation_steps=32
)
@ghylander
An actual error log would be great, since it is difficult to point out the error just from the code.
However, here's my shot in the dark.
train_dataset_unb = train_dataset.unbatch()
train_dataset1 = train_dataset_unb.map(resize_data1)
train_dataset2 = train_dataset_unb.map(resize_data2)
train_dataset3 = train_dataset_unb.map(resize_data3)
train_dataset_zip = tf.data.Dataset.zip((train_dataset1, train_dataset2, train_dataset3))
This results in each sample looking like
((x1, x2, x3), (y1,y2,y3)) # not what you are expecting
Please correct me if I'm wrong.
The code I posted is executable top to bottom (given you meet the dependency, tensorflow). While it uses the cats and dogs tensorflow dataset, it mimics exactly my real code, down to the same error with model.fit()
The error I get after the last line is:
ValueError: in user code:
File "/home/ghylander/miniconda3/envs/CH/lib/python3.10/site-packages/keras/engine/training.py", line 1021, in train_function *
return step_function(self, iterator)
File "/home/ghylander/miniconda3/envs/CH/lib/python3.10/site-packages/keras/engine/training.py", line 1010, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/home/ghylander/miniconda3/envs/CH/lib/python3.10/site-packages/keras/engine/training.py", line 1000, in run_step **
outputs = model.train_step(data)
File "/home/ghylander/miniconda3/envs/CH/lib/python3.10/site-packages/keras/engine/training.py", line 859, in train_step
y_pred = self(x, training=True)
File "/home/ghylander/miniconda3/envs/CH/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/ghylander/miniconda3/envs/CH/lib/python3.10/site-packages/keras/engine/input_spec.py", line 200, in assert_input_compatibility
raise ValueError(f'Layer "{layer_name}" expects {len(input_spec)} input(s),'
ValueError: Layer "model_3" expects 3 input(s), but it received 2 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(64, 64, 3) dtype=float32>, <tf.Tensor 'IteratorGetNext:1' shape=(1,) dtype=float32>]
Regarding what the train_dataset_zip object returns, this code:
for element in train_dataset_zip.as_numpy_iterator():
print("element", element)
returns:
element [[[x1, y1, z1], [c1,]], [[x2, y2, z2], [c2,]], [[x3, y3, z3], [c3,]]]
element [[[x1, y1, z1], [c1,]], [[x2, y2, z2], [c2,]], [[x3, y3, z3], [c3,]]]
and this code:
for idx, (ds1, ds2, ds3) in enumerate(train_dataset_zip):
print("ds1: ", ds1)
print("ds2: ", ds2)
print("ds3: ", ds3)
returns:
ds1: (<tf.Tensor: shape=(64, 64, 3), dtype=float32, numpy=[(large array with raw pixel values)], , dtype=float32)>, <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>
ds2: (<tf.Tensor: shape=(64, 64, 3), dtype=float32, numpy=[(large array with raw pixel values)], , dtype=float32)>, <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>
ds3: (<tf.Tensor: shape=(64, 64, 3), dtype=float32, numpy=[(large array with raw pixel values)], , dtype=float32)>, <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>
ds1: (<tf.Tensor: shape=(64, 64, 3), dtype=float32, numpy=[(large array with raw pixel values)], , dtype=float32)>, <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>
ds2: (<tf.Tensor: shape=(64, 64, 3), dtype=float32, numpy=[(large array with raw pixel values)], , dtype=float32)>, <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>
ds3: (<tf.Tensor: shape=(64, 64, 3), dtype=float32, numpy=[(large array with raw pixel values)], , dtype=float32)>, <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>
Both methods return the 3 elements you'd expect to be returned, ((image1, class1), (image2, class2), (image3, class3)). Furthermore, the traceback already reveals that only the first (I assume it's the first) element (image1, class1) is reaching the model:
ValueError: Layer "model_3" expects 3 input(s), but it received 2 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(64, 64, 3) dtype=float32>, <tf.Tensor 'IteratorGetNext:1' shape=(1,) dtype=float32>]
Additionally, the tf.data.Dataset.zip docs show what the behaviour of the method is:
>>> a = tf.data.Dataset.range(1, 4) # ==> [ 1, 2, 3 ]
>>> b = tf.data.Dataset.range(4, 7) # ==> [ 4, 5, 6 ]
>>> ds = tf.data.Dataset.zip((a, b))
>>> list(ds.as_numpy_iterator())
[(1, 4), (2, 5), (3, 6)]
While I'm at this, I'd also like to inquire about using individual datasets, instead of a zipped one
I tried to do:
history = model.fit([train_dataset1, train_dataset2, train_dataset3],
epochs=epochs,
callbacks=callbacks,
validation_data=validation_dataset_zip,
validation_steps=steps
)
but I get this error:
[ValueError: Failed to find data adapter that can handle input: (<class 'list'> containing values of types {"<class 'tensorflow.python.data.ops.dataset_ops.MapDataset'>"}), <class 'NoneType'>]()
If I try to use a dict with the name inputs instead:
history = model.fit({'Input1': train_dataset1, 'Input2': train_dataset2, 'Input3': train_dataset3},
epochs=epochs,
callbacks=callbacks,
validation_data=validation_dataset_zip,
validation_steps=steps
)
I get:
[ValueError: Failed to find data adapter that can handle input: (<class 'dict'> containing {"<class 'str'>"} keys and {"<class 'tensorflow.python.data.ops.dataset_ops.MapDataset'>"} values), <class 'NoneType'>]()
I'm confused as to why I get these errors, only thing I can think about is unability of handling Dataset objects nested in the list or dict
@sachinprasadhs Was able to replicate the issue on colab using TF v2.8.0 and tf-nightly(2.9.0.dev20220302) ,please find the gist here.Thanks!
Hi Any progress on this matter? I know it's possible to write custom training loops, but this seems like something the built-in training loop should be able to handle
@ghylander I am attaching your workaround here just for reference.
thanks for the input @gowthamkpr, but that's my very own SO issue I opened before I figured out:
- How to approach/implement what I wanted to do (use 3 separate datasets as input for a single model with 3 different inputs, and distribute each input accordingly)
- How to achieve this (using tf.data.Dataset.zip() ), but there being a bug.
In the issue I mention the only workaround that I can think of, writing a custom training loop. However, I've kept this open as I feel this kind of behaviour should be supported by the keras built in model.fit() training loop method
up, I have the same issue
While I'm at this, I'd also like to inquire about using individual datasets, instead of a zipped one
I tried to do:
history = model.fit([train_dataset1, train_dataset2, train_dataset3], epochs=epochs, callbacks=callbacks, validation_data=validation_dataset_zip, validation_steps=steps )
but I get this error:
[ValueError: Failed to find data adapter that can handle input: (<class 'list'> containing values of types {"<class 'tensorflow.python.data.ops.dataset_ops.MapDataset'>"}), <class 'NoneType'>]()
If I try to use a dict with the name inputs instead:
history = model.fit({'Input1': train_dataset1, 'Input2': train_dataset2, 'Input3': train_dataset3}, epochs=epochs, callbacks=callbacks, validation_data=validation_dataset_zip, validation_steps=steps )
I get:
[ValueError: Failed to find data adapter that can handle input: (<class 'dict'> containing {"<class 'str'>"} keys and {"<class 'tensorflow.python.data.ops.dataset_ops.MapDataset'>"} values), <class 'NoneType'>]()
I'm confused as to why I get these errors, only thing I can think about is unability of handling Dataset objects nested in the list or dict
While working on a project of my own, I have gotten stuck for a day trying to figure out why I can't get the training to work, only to find that I'm having the same issue as described here.
@ghylander
I have a working script (i.e., the model trains smoothly), but I am not sure if that is what you want. Please take a look and let me know:
The key issue here is that we are inspecting only the inputs and not the outputs. Note that the model architecture is 3 inputs, 1 output. Thus, our dataset should be in that format as well. See the image below.
Now, the current dataset has the structure: ((img, label), (img, label), (img, label))
but actually we want it to be ((img, img, img), label)
. So, we simply write a function which does exactly this and map the dataset accordingly.
def post_zip_process(example1, example2, example3):
print((example1[0], example2[0], example3[0]), example1[1])
return (example1[0], example2[0], example3[0]), example1[1]
train_dataset_zip = train_dataset_zip.map(post_zip_process)
validation_dataset_zip = validation_dataset_zip.map(post_zip_process)
And the training works just fine. Please take a look at the gist here.
/cc @Faptimus420 @rozhanroukhosh @gowthamkpr @sushreebarsa
@AdityaKane2001 Thank you very much, that worked. I did have to add a .batch(1) function after .map(post_zip_process) for the network to train (.batch(32) would not work), but it worked. I feel like I should have been able to realize this myself, though the documentation on the .fit function does not seem to really talk about this situation of using tf.data datasets in a multi-input network well, only about using raw tensors or numpy arrays. Maybe a slight revision there should be considered...
I do have a follow-up question though: Is there a way to combine your solution with passing in a dict for the x argument of .fit, where I could still use the format {'input1': tf.data dataset1, 'input2': tf.data dataset2, ...}?
@Faptimus420
Instead of ((img, img, img), label)
, we can have {"inp1":img,"inp2":img,"inp3":img}, label
. The format you mentioned is not possible because
If x is a dataset, generator, or [keras.utils.Sequence](https://www.tensorflow.org/api_docs/python/tf/keras/utils/Sequence) instance, y should not be specified (since targets will be obtained from x).
source: https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit
@ghylander
Requesting to close this issue in case it is resolved. Thank you.
Hi, sorry, just saw this
I'll try your working script and report back
I tried this method you shared on my own dataset and your python notebook and ran into a common problem. In data set training, validation values have a fixed value and no increase occurs. Model training result and my own python notebook are attached. I would be glad if you help. My Python Notebook: https://github.com/ahmetfurkaann/MultiView-Model/blob/main/MultiView_Model.ipynb
Your dataset result is:
My dataset result is:
@ghylander, As suggested above, I tried to execute the code with the alternative approach on tensorflow v2.13 and it was executed without any issue/error. Kindly find the gist of it here. Thank you!
Hi, checked this and it seems to work. It's been quite long, so it's honestly a bit hard to firmly confirm or deny this was the intended behaviour back then.
If I understood correctly from @AdityaKane2001's post, this wasn't really a bug but a result of misinterpretation of the documentation/expected behaviour?
@ghylander Maybe.
@ghylander, As mentioned, if the issue wasn't really a bug, could you please feel free to move this issue to closed status. Thank you!