keras icon indicating copy to clipboard operation
keras copied to clipboard

model.fit() bug when using a zipped Dataset as input for a multiple-input model

Open ghylander opened this issue 3 years ago • 8 comments

("Cross-post" of https://github.com/tensorflow/tensorflow/issues/54271)

System information.

  • Have I written custom code (as opposed to using a stock example script provided in Keras): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
  • TensorFlow installed from (source or binary): pip
  • TensorFlow version (use command below): 2.9.0.dev20220202
  • Python version: 3.10.2

Describe the problem.

With a multi-input model, feding a dataset which returns a tuple of multiple elements to the tf.keras.model.fit() method, instead of using the whole tuple as input (then distributing each element to each input), the first element in the tuple is used as the input for the whole model.

Describe the current behavior I have a custom model which takes 3 images as input I have 3 separate (currently unbatched as I debug this error) datasets, classes encoded as categorical, meaning each input tensor has shape ((x, y, z), (c,)) Trying to input the 3 datasets separately fails, either by inputting them as a dict mapping each ds to a named input {"Input1": ds1, "Input2": ds2, "Input3": ds3}, or using a list [ds, ds2, ds3].

I zip the three datasets. Testing the resulting dataset with (using the docs as guidance):

for element in zipped_ds.as_numpy_iterator():
print("element", element)

Outputs:

element [[[x1, y1, z1], [c1,]], [[x2, y2, z2], [c2,]], [[x3, y3, z3], [c3,]]] 
element [[[x1, y1, z1], [c1,]], [[x2, y2, z2], [c2,]], [[x3, y3, z3], [c3,]]] 
...

Seems to work, right? Every call to the iterator returns 3 elements. Well, when I use the zipped dataset as input of model_fit(), the first element in the tuple returned by the dataset object is treated as the input for the whole model, meaning that instead of using [[[x1, y1, z1], [c1,]], [[x2, y2, z2], [c2,]], [[x3, y3, z3], [c3,]]] as the input to the model, it uses [[x1, y1, z1], [c1,]], and the training fails.

I've tried many approaches, like using zipped_ds.as_numpy_iterator() or ([ds1, ds2, ds3] for idx, (ds1, ds2, ds3) in enumerate(zipped_ds)), but both fail as the returned item is empty

Standalone code to reproduce the issue
# %%
import os

import tensorflow as tf # tensorflow nightly, version>=2.5
from tensorflow import keras
from tensorflow.image import crop_to_bounding_box as tfimgcrop
from tensorflow.keras.preprocessing import image_dataset_from_directory

BATCH_SIZE=32 # Adjust?

IMG_SIZE=(224, 224)
IMG_SHAPE = IMG_SIZE + (3,)

# %%
_URL = 'https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip'
path_to_zip = tf.keras.utils.get_file('cats_and_dogs.zip', origin=_URL, extract=True)
PATH = os.path.join(os.path.dirname(path_to_zip), 'cats_and_dogs_filtered')

train_dir = os.path.join(PATH, 'train')
validation_dir = os.path.join(PATH, 'validation')

train_dataset = tf.keras.preprocessing.image_dataset_from_directory(train_dir,
                                             shuffle=False,
                                             label_mode='categorical',
                                             batch_size=32,
                                             image_size=IMG_SIZE)
validation_dataset = tf.keras.preprocessing.image_dataset_from_directory(validation_dir,
                                             shuffle=False,
                                             label_mode='categorical',
                                             batch_size=32,
                                             image_size=IMG_SIZE)

# %%
base_model1 = tf.keras.applications.MobileNetV3Large(input_shape=(64, 64, 3),
                                               include_top=False,
                                               weights='imagenet',
                                               minimalistic=False,
                                               pooling=max,
                                               dropout_rate=0.2)
base_model2 = tf.keras.applications.MobileNetV3Large(input_shape=(64, 64, 3),
                                               include_top=False,
                                               weights='imagenet',
                                               minimalistic=False,
                                               pooling=max,
                                               dropout_rate=0.2)
base_model3 = tf.keras.applications.MobileNetV3Large(input_shape=(64, 64, 3),
                                               include_top=False,
                                               weights='imagenet',
                                               minimalistic=False,
                                               pooling=max,
                                               dropout_rate=0.2)

# %%
pre_concat_layer1 = tf.keras.layers.Dense(64, 
                                        activation='relu', 
                                        kernel_initializer='random_uniform', 
                                        bias_initializer='zeros')
pre_concat_layer2 = tf.keras.layers.Dense(64, 
                                        activation='relu', 
                                        kernel_initializer='random_uniform', 
                                        bias_initializer='zeros')
pre_concat_layer3 = tf.keras.layers.Dense(64, 
                                        activation='relu', 
                                        kernel_initializer='random_uniform', 
                                        bias_initializer='zeros')

post_concat_layer = tf.keras.layers.Dense(128, 
                                        activation='relu', 
                                        kernel_initializer='random_uniform', 
                                        bias_initializer='zeros')
prediction_layer = tf.keras.layers.Dense(2, 
                                        activation='softmax', 
                                        kernel_initializer='random_uniform', 
                                        bias_initializer='zeros')

# %%
input1 = tf.keras.Input(shape=(64, 64, 3), name="First")
input2 = tf.keras.Input(shape=(64, 64, 3), name="Second")
input3 = tf.keras.Input(shape=(64, 64, 3), name="Third")

x = base_model1(input1, training=False)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dropout(0.2)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = pre_concat_layer1(x)
x = tf.keras.layers.Dropout(0.2)(x)
outputs = tf.keras.layers.BatchNormalization()(x)
body1 = tf.keras.Model(input1, outputs)

x = base_model2(input2, training=False)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dropout(0.2)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = pre_concat_layer2(x)
x = tf.keras.layers.Dropout(0.2)(x)
outputs = tf.keras.layers.BatchNormalization()(x)
body2 = tf.keras.Model(input2, outputs)

x = base_model3(input3, training=False)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dropout(0.2)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = pre_concat_layer3(x)
x = tf.keras.layers.Dropout(0.2)(x)
outputs = tf.keras.layers.BatchNormalization()(x)
body3 = tf.keras.Model(input3, outputs)

# %%
body1.get_layer("MobilenetV3large")._name = "MobilenetV3large1"
body2.get_layer("MobilenetV3large")._name = "MobilenetV3large2"
body3.get_layer("MobilenetV3large")._name = "MobilenetV3large3"

# %%
combinedInput = tf.keras.layers.concatenate([body1.output, body2.output, body3.output])
x = post_concat_layer(combinedInput)
x = tf.keras.layers.Dropout(0.2)(x)
x = tf.keras.layers.BatchNormalization()(x)
foutput = prediction_layer(x)
final_model = tf.keras.Model(inputs=[body1.input, body2.input, body3.input], outputs=foutput)

# %%
def resize_data1(images, classes):
    return (tfimgcrop(images,
                        offset_height=0,
                        offset_width=0,
                        target_height=64,
                        target_width=64),
                    classes)
def resize_data2(images, classes):
    return (tfimgcrop(images,
                        offset_height=0,
                        offset_width=64,
                        target_height=64,
                        target_width=64),
                    classes)
def resize_data3(images, classes):
    return (tfimgcrop(images,
                        offset_height=0,
                        offset_width=128,
                        target_height=64,
                        target_width=64),
                    classes)

# %%
train_dataset_unb = train_dataset.unbatch()
train_dataset1 = train_dataset_unb.map(resize_data1)
train_dataset2 = train_dataset_unb.map(resize_data2)
train_dataset3 = train_dataset_unb.map(resize_data3)
train_dataset_zip = tf.data.Dataset.zip((train_dataset1, train_dataset2, train_dataset3))

validation_dataset_unb = validation_dataset.unbatch()
validation_dataset1 = validation_dataset_unb.map(resize_data1)
validation_dataset2 = validation_dataset_unb.map(resize_data2)
validation_dataset3 = validation_dataset_unb.map(resize_data3)
validation_dataset_zip = tf.data.Dataset.zip((validation_dataset1, validation_dataset2, validation_dataset3))

# %%
final_model.compile()

# %%
history = final_model.fit(train_dataset_zip,
                        epochs=999, 
                        validation_data=validation_dataset_zip,
                        validation_steps=32
                        )

ghylander avatar Feb 06 '22 11:02 ghylander

@ghylander

An actual error log would be great, since it is difficult to point out the error just from the code.

However, here's my shot in the dark.

train_dataset_unb = train_dataset.unbatch()
train_dataset1 = train_dataset_unb.map(resize_data1)
train_dataset2 = train_dataset_unb.map(resize_data2)
train_dataset3 = train_dataset_unb.map(resize_data3)
train_dataset_zip = tf.data.Dataset.zip((train_dataset1, train_dataset2, train_dataset3))

This results in each sample looking like

((x1, x2, x3), (y1,y2,y3)) # not what you are expecting

Please correct me if I'm wrong.

AdityaKane2001 avatar Feb 09 '22 14:02 AdityaKane2001

The code I posted is executable top to bottom (given you meet the dependency, tensorflow). While it uses the cats and dogs tensorflow dataset, it mimics exactly my real code, down to the same error with model.fit()

The error I get after the last line is:

ValueError: in user code:

    File "/home/ghylander/miniconda3/envs/CH/lib/python3.10/site-packages/keras/engine/training.py", line 1021, in train_function  *
        return step_function(self, iterator)
    File "/home/ghylander/miniconda3/envs/CH/lib/python3.10/site-packages/keras/engine/training.py", line 1010, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/home/ghylander/miniconda3/envs/CH/lib/python3.10/site-packages/keras/engine/training.py", line 1000, in run_step  **
        outputs = model.train_step(data)
    File "/home/ghylander/miniconda3/envs/CH/lib/python3.10/site-packages/keras/engine/training.py", line 859, in train_step
        y_pred = self(x, training=True)
    File "/home/ghylander/miniconda3/envs/CH/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "/home/ghylander/miniconda3/envs/CH/lib/python3.10/site-packages/keras/engine/input_spec.py", line 200, in assert_input_compatibility
        raise ValueError(f'Layer "{layer_name}" expects {len(input_spec)} input(s),'

    ValueError: Layer "model_3" expects 3 input(s), but it received 2 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(64, 64, 3) dtype=float32>, <tf.Tensor 'IteratorGetNext:1' shape=(1,) dtype=float32>]

Regarding what the train_dataset_zip object returns, this code:

for element in train_dataset_zip.as_numpy_iterator():
print("element", element)

returns:

element [[[x1, y1, z1], [c1,]], [[x2, y2, z2], [c2,]], [[x3, y3, z3], [c3,]]] 
element [[[x1, y1, z1], [c1,]], [[x2, y2, z2], [c2,]], [[x3, y3, z3], [c3,]]] 

and this code:

for idx, (ds1, ds2, ds3) in enumerate(train_dataset_zip):
    print("ds1: ", ds1)
    print("ds2: ", ds2)
    print("ds3: ", ds3)

returns:

ds1:  (<tf.Tensor: shape=(64, 64, 3), dtype=float32, numpy=[(large array with raw pixel values)], , dtype=float32)>, <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>
ds2:  (<tf.Tensor: shape=(64, 64, 3), dtype=float32, numpy=[(large array with raw pixel values)], , dtype=float32)>, <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>
ds3:  (<tf.Tensor: shape=(64, 64, 3), dtype=float32, numpy=[(large array with raw pixel values)], , dtype=float32)>, <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>
ds1:  (<tf.Tensor: shape=(64, 64, 3), dtype=float32, numpy=[(large array with raw pixel values)], , dtype=float32)>, <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>
ds2:  (<tf.Tensor: shape=(64, 64, 3), dtype=float32, numpy=[(large array with raw pixel values)], , dtype=float32)>, <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>
ds3:  (<tf.Tensor: shape=(64, 64, 3), dtype=float32, numpy=[(large array with raw pixel values)], , dtype=float32)>, <tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>

Both methods return the 3 elements you'd expect to be returned, ((image1, class1), (image2, class2), (image3, class3)). Furthermore, the traceback already reveals that only the first (I assume it's the first) element (image1, class1) is reaching the model:

    ValueError: Layer "model_3" expects 3 input(s), but it received 2 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(64, 64, 3) dtype=float32>, <tf.Tensor 'IteratorGetNext:1' shape=(1,) dtype=float32>]

Additionally, the tf.data.Dataset.zip docs show what the behaviour of the method is:

>>> a = tf.data.Dataset.range(1, 4)  # ==> [ 1, 2, 3 ]
>>> b = tf.data.Dataset.range(4, 7)  # ==> [ 4, 5, 6 ]
>>> ds = tf.data.Dataset.zip((a, b))
>>> list(ds.as_numpy_iterator())
[(1, 4), (2, 5), (3, 6)]

ghylander avatar Feb 10 '22 07:02 ghylander

While I'm at this, I'd also like to inquire about using individual datasets, instead of a zipped one

I tried to do:

    history = model.fit([train_dataset1, train_dataset2, train_dataset3],
                        epochs=epochs, 
                        callbacks=callbacks,
                        validation_data=validation_dataset_zip,
                        validation_steps=steps
                        )

but I get this error:

[ValueError: Failed to find data adapter that can handle input: (<class 'list'> containing values of types {"<class 'tensorflow.python.data.ops.dataset_ops.MapDataset'>"}), <class 'NoneType'>]()

If I try to use a dict with the name inputs instead:

    history = model.fit({'Input1': train_dataset1, 'Input2': train_dataset2, 'Input3': train_dataset3},
                        epochs=epochs, 
                        callbacks=callbacks,
                        validation_data=validation_dataset_zip,
                        validation_steps=steps
                        )

I get:

[ValueError: Failed to find data adapter that can handle input: (<class 'dict'> containing {"<class 'str'>"} keys and {"<class 'tensorflow.python.data.ops.dataset_ops.MapDataset'>"} values), <class 'NoneType'>]()

I'm confused as to why I get these errors, only thing I can think about is unability of handling Dataset objects nested in the list or dict

ghylander avatar Feb 10 '22 07:02 ghylander

@sachinprasadhs Was able to replicate the issue on colab using TF v2.8.0 and tf-nightly(2.9.0.dev20220302) ,please find the gist here.Thanks!

sushreebarsa avatar Mar 02 '22 17:03 sushreebarsa

Hi Any progress on this matter? I know it's possible to write custom training loops, but this seems like something the built-in training loop should be able to handle

ghylander avatar Mar 24 '22 09:03 ghylander

@ghylander I am attaching your workaround here just for reference.

gowthamkpr avatar Aug 16 '22 17:08 gowthamkpr

thanks for the input @gowthamkpr, but that's my very own SO issue I opened before I figured out:

  1. How to approach/implement what I wanted to do (use 3 separate datasets as input for a single model with 3 different inputs, and distribute each input accordingly)
  2. How to achieve this (using tf.data.Dataset.zip() ), but there being a bug.

In the issue I mention the only workaround that I can think of, writing a custom training loop. However, I've kept this open as I feel this kind of behaviour should be supported by the keras built in model.fit() training loop method

ghylander avatar Aug 22 '22 06:08 ghylander

up, I have the same issue

rozhanroukhosh avatar Sep 28 '22 09:09 rozhanroukhosh

While I'm at this, I'd also like to inquire about using individual datasets, instead of a zipped one

I tried to do:

    history = model.fit([train_dataset1, train_dataset2, train_dataset3],
                        epochs=epochs, 
                        callbacks=callbacks,
                        validation_data=validation_dataset_zip,
                        validation_steps=steps
                        )

but I get this error:

[ValueError: Failed to find data adapter that can handle input: (<class 'list'> containing values of types {"<class 'tensorflow.python.data.ops.dataset_ops.MapDataset'>"}), <class 'NoneType'>]()

If I try to use a dict with the name inputs instead:

    history = model.fit({'Input1': train_dataset1, 'Input2': train_dataset2, 'Input3': train_dataset3},
                        epochs=epochs, 
                        callbacks=callbacks,
                        validation_data=validation_dataset_zip,
                        validation_steps=steps
                        )

I get:

[ValueError: Failed to find data adapter that can handle input: (<class 'dict'> containing {"<class 'str'>"} keys and {"<class 'tensorflow.python.data.ops.dataset_ops.MapDataset'>"} values), <class 'NoneType'>]()

I'm confused as to why I get these errors, only thing I can think about is unability of handling Dataset objects nested in the list or dict

While working on a project of my own, I have gotten stuck for a day trying to figure out why I can't get the training to work, only to find that I'm having the same issue as described here.

Faptimus420 avatar Nov 12 '22 18:11 Faptimus420

@ghylander

I have a working script (i.e., the model trains smoothly), but I am not sure if that is what you want. Please take a look and let me know:

The key issue here is that we are inspecting only the inputs and not the outputs. Note that the model architecture is 3 inputs, 1 output. Thus, our dataset should be in that format as well. See the image below.

Now, the current dataset has the structure: ((img, label), (img, label), (img, label)) but actually we want it to be ((img, img, img), label). So, we simply write a function which does exactly this and map the dataset accordingly.

def post_zip_process(example1, example2, example3):
    print((example1[0], example2[0], example3[0]), example1[1])
    return (example1[0], example2[0], example3[0]), example1[1]

train_dataset_zip = train_dataset_zip.map(post_zip_process)
validation_dataset_zip = validation_dataset_zip.map(post_zip_process)

And the training works just fine. Please take a look at the gist here.

image

/cc @Faptimus420 @rozhanroukhosh @gowthamkpr @sushreebarsa

AdityaKane2001 avatar Nov 13 '22 08:11 AdityaKane2001

@AdityaKane2001 Thank you very much, that worked. I did have to add a .batch(1) function after .map(post_zip_process) for the network to train (.batch(32) would not work), but it worked. I feel like I should have been able to realize this myself, though the documentation on the .fit function does not seem to really talk about this situation of using tf.data datasets in a multi-input network well, only about using raw tensors or numpy arrays. Maybe a slight revision there should be considered...

I do have a follow-up question though: Is there a way to combine your solution with passing in a dict for the x argument of .fit, where I could still use the format {'input1': tf.data dataset1, 'input2': tf.data dataset2, ...}?

Faptimus420 avatar Nov 14 '22 15:11 Faptimus420

@Faptimus420

Instead of ((img, img, img), label), we can have {"inp1":img,"inp2":img,"inp3":img}, label. The format you mentioned is not possible because

If x is a dataset, generator, or [keras.utils.Sequence](https://www.tensorflow.org/api_docs/python/tf/keras/utils/Sequence) instance, y should not be specified (since targets will be obtained from x).

source: https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit

AdityaKane2001 avatar Nov 14 '22 15:11 AdityaKane2001

@ghylander

Requesting to close this issue in case it is resolved. Thank you.

AdityaKane2001 avatar Nov 15 '22 15:11 AdityaKane2001

Hi, sorry, just saw this

I'll try your working script and report back

ghylander avatar Jan 26 '23 10:01 ghylander

I tried this method you shared on my own dataset and your python notebook and ran into a common problem. In data set training, validation values have a fixed value and no increase occurs. Model training result and my own python notebook are attached. I would be glad if you help. My Python Notebook: https://github.com/ahmetfurkaann/MultiView-Model/blob/main/MultiView_Model.ipynb

Your dataset result is: image

My dataset result is: image

ahmetfurkaann avatar Jan 27 '23 19:01 ahmetfurkaann

@ghylander, As suggested above, I tried to execute the code with the alternative approach on tensorflow v2.13 and it was executed without any issue/error. Kindly find the gist of it here. Thank you!

tilakrayal avatar Aug 24 '23 08:08 tilakrayal

Hi, checked this and it seems to work. It's been quite long, so it's honestly a bit hard to firmly confirm or deny this was the intended behaviour back then.

If I understood correctly from @AdityaKane2001's post, this wasn't really a bug but a result of misinterpretation of the documentation/expected behaviour?

ghylander avatar Aug 24 '23 09:08 ghylander

@ghylander Maybe.

AdityaKane2001 avatar Aug 25 '23 18:08 AdityaKane2001

@ghylander, As mentioned, if the issue wasn't really a bug, could you please feel free to move this issue to closed status. Thank you!

tilakrayal avatar Aug 30 '23 13:08 tilakrayal

Are you satisfied with the resolution of your issue? Yes No

google-ml-butler[bot] avatar Aug 31 '23 06:08 google-ml-butler[bot]