keras
keras copied to clipboard
[Bug] With TensorFlow backend, using PyTorch DataLoader with different per-batch size does not work.
Hi all,
consider a model obtaining on input sequences of dynamic sizes (sentences padded to the longest one). Here is a simple example:
inputs = keras.Input([None, 2])
hidden = keras.layers.MaxPooling1D()(inputs)
outputs = keras.layers.Dense(1)(hidden)
m = keras.Model(inputs=inputs, outputs=outputs)
m.compile(optimizer=keras.optimizers.Adam(), loss=keras.losses.MeanSquaredError())
We provide the data using a DataLoader returning batches of different sizes. In practice, a collate_fn would be used; here I generate individual examples in a batch to be of the same size:
batch_size = 3
xs = [np.ones([4, 2])] * batch_size + [np.ones([5, 2])] * batch_size
ys = np.ones([2 * batch_size, 1])
dataset = torch.utils.data.StackDataset(xs, ys)
dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size)
Then training with model.fit(dataloader)
- works with PyTorch backend
- fails with TensorFlow backend, with the following error:
TypeError: `generator` yielded an element of shape (3, 5, 2) where an element of shape (None, 4, 2) was expected.
The problem is caused by the fact that the output_signature for tf.data.Dataset.from_generator is generated from the first batch (where shapes of axis 0 are replaced by Nones to handle differing batch sizes).
Maybe the solution would be to set all dimensions in the output_signature to None? That way, there would be no static information about the sizes, but that seems like the only valid solution.
See https://colab.research.google.com/drive/1FBgGviMV5yVmUpQQNRKpU67DSsqwHuPF?usp=sharing which demonstrates the training works fine with PyTorch backend but not with TensorFlow backend, using the current keras-nightly.
Please, provide the full code
The full code is in the colab I reference in the original issue. Is it not enough? It can be run and demonstrates the problem.
@foxik , for the record setting all dimensions in the output_signature to None did not work because those dimensions are used to build the model. The chosen approach is somewhat unsatisfying but hopefully will work well enough.
@foxik , for the record setting all dimensions in the output_signature to None did not work because those dimensions are used to build the model. The chosen approach is somewhat unsatisfying but hopefully will work well enough.
I hoped that the specifications in the keras.layers.Input would be enough to specify the required static shape. I.e., if you have keras.layers.Input([None, 3]), then it would not matter that the dataset thinks its inputs are [None, None, None]. But I can imagine that for the targets, for example the heuristic for "accuracy" metric needs to inspect the shape of the target output, which is provided only by the dataset and not by the keras.layers.Input, true.
Anyway, thanks you very much for your work!