privacy
privacy copied to clipboard
Shape errors linked to batch_size and num_microbatches even if `batch_size % num_microbatches == 0`
Hi,
I wanted to benchmark training a model wit and without tf privacy. My problem is a modified version of MNIST classification, and so I'm only classifying numbers from 5 to 9. I filtered the input, train set size is 29404. I set batch_size=32
and num_microbatches=16
. The model starts training but at the end of the first epoch I get an error:
Train on 29404 samples, validate on 4861 samples
Epoch 1/15
29344/29404 [============================>.] - ETA: 0s - loss: 1.6178 - acc: 0.2068
...
InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Input to reshape is a tensor with 28 values, but the requested shape requires a multiple of 16
[[{{node training/Reshape}}]]
[[loss_1/mul/_59]]
(1) Invalid argument: Input to reshape is a tensor with 28 values, but the requested shape requires a multiple of 16
[[{{node training/Reshape}}]]
0 successful operations.
0 derived errors ignored.
I am using tf.keras
(sequential) in Jupyter Lab instance running in GCP AI Platform with 1 GPU NVIDIA Tesla K80
TF v 1.15
tensorflow-privacy v 0.2.2
As I couldn't get a clue I run the tutorial Classification_Privacy.ipynb
on colab and changed batch_size=64
and num_microbatches = 16
. I get a different error but still related to shape mismatch:
Train on 60000 samples, validate on 10000 samples
Epoch 1/15
59968/60000 [============================>.] - ETA: 0s - loss: 2.3192 - acc: 0.1403
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-8-f5e717f6bd9c> in <module>()
4 epochs=epochs,
5 validation_data=(test_data, test_labels),
----> 6 batch_size=batch_size)
3 frames
/tensorflow-1.15.2/python3.6/tensorflow_core/python/keras/engine/training_utils.py in aggregate(self, batch_outs, batch_start, batch_end)
130 self.results[0] += batch_outs[0]
131 else:
--> 132 self.results[0] += batch_outs[0] * (batch_end - batch_start)
133 # Metrics (always stateful, just grab current values.)
134 self.results[1:] = batch_outs[1:]
ValueError: operands could not be broadcast together with shapes (64,) (32,) (64,)
This last error is similar to #96 but my batch_size
is a multiple of num_microbatches
.
I can provide the full stack trace of the first error if needed. Thanks
Yes please do provide the full stack trace.
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
<ipython-input-24-c6acfd19e5ca> in <module>
3 batch_size=batch_size,
4 epochs=15,
----> 5 validation_data=(x_test_n, y_test_b))
6
7 score = model.evaluate(x_test_n, y_test_b, verbose=0)
/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
725 max_queue_size=max_queue_size,
726 workers=workers,
--> 727 use_multiprocessing=use_multiprocessing)
728
729 def evaluate(self,
/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_arrays.py in fit(self, model, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, **kwargs)
673 validation_steps=validation_steps,
674 validation_freq=validation_freq,
--> 675 steps_name='steps_per_epoch')
676
677 def evaluate(self,
/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_arrays.py in model_iteration(model, inputs, targets, sample_weights, batch_size, epochs, verbose, callbacks, val_inputs, val_targets, val_sample_weights, shuffle, initial_epoch, steps_per_epoch, validation_steps, validation_freq, mode, validation_in_fit, prepared_feed_values_from_dataset, steps_name, **kwargs)
392
393 # Get outputs.
--> 394 batch_outs = f(ins_batch)
395 if not isinstance(batch_outs, list):
396 batch_outs = [batch_outs]
/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/keras/backend.py in __call__(self, inputs)
3474
3475 fetched = self._callable_fn(*array_vals,
-> 3476 run_metadata=self.run_metadata)
3477 self._call_fetch_callbacks(fetched[-len(self._fetches):])
3478 output_structure = nest.pack_sequence_as(
/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py in __call__(self, *args, **kwargs)
1470 ret = tf_session.TF_SessionRunCallable(self._session._session,
1471 self._handle, args,
-> 1472 run_metadata_ptr)
1473 if run_metadata:
1474 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Input to reshape is a tensor with 28 values, but the requested shape requires a multiple of 16
[[{{node training/Reshape}}]]
[[loss_1/mul/_59]]
(1) Invalid argument: Input to reshape is a tensor with 28 values, but the requested shape requires a multiple of 16
[[{{node training/Reshape}}]]
0 successful operations.
0 derived errors ignored.
Hi, did you solve this?
For some unknown reason, the number of total input samples should be a factor of batch size. So just as a temporarily solution, if you have 60,000 samples and batch size of 64, simply truncate your input to 59968 (=64* 937) samples. This solves the problem.