privacy icon indicating copy to clipboard operation
privacy copied to clipboard

Shape errors linked to batch_size and num_microbatches even if `batch_size % num_microbatches == 0`

Open giulbia opened this issue 4 years ago • 4 comments

Hi, I wanted to benchmark training a model wit and without tf privacy. My problem is a modified version of MNIST classification, and so I'm only classifying numbers from 5 to 9. I filtered the input, train set size is 29404. I set batch_size=32 and num_microbatches=16. The model starts training but at the end of the first epoch I get an error:

Train on 29404 samples, validate on 4861 samples
Epoch 1/15
29344/29404 [============================>.] - ETA: 0s - loss: 1.6178 - acc: 0.2068
...
InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Input to reshape is a tensor with 28 values, but the requested shape requires a multiple of 16
	 [[{{node training/Reshape}}]]
	 [[loss_1/mul/_59]]
  (1) Invalid argument: Input to reshape is a tensor with 28 values, but the requested shape requires a multiple of 16
	 [[{{node training/Reshape}}]]
0 successful operations.
0 derived errors ignored.

I am using tf.keras (sequential) in Jupyter Lab instance running in GCP AI Platform with 1 GPU NVIDIA Tesla K80 TF v 1.15 tensorflow-privacy v 0.2.2

As I couldn't get a clue I run the tutorial Classification_Privacy.ipynb on colab and changed batch_size=64 and num_microbatches = 16. I get a different error but still related to shape mismatch:

Train on 60000 samples, validate on 10000 samples
Epoch 1/15
59968/60000 [============================>.] - ETA: 0s - loss: 2.3192 - acc: 0.1403
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-f5e717f6bd9c> in <module>()
      4           epochs=epochs,
      5           validation_data=(test_data, test_labels),
----> 6           batch_size=batch_size)

3 frames
/tensorflow-1.15.2/python3.6/tensorflow_core/python/keras/engine/training_utils.py in aggregate(self, batch_outs, batch_start, batch_end)
    130       self.results[0] += batch_outs[0]
    131     else:
--> 132       self.results[0] += batch_outs[0] * (batch_end - batch_start)
    133     # Metrics (always stateful, just grab current values.)
    134     self.results[1:] = batch_outs[1:]

ValueError: operands could not be broadcast together with shapes (64,) (32,) (64,)

This last error is similar to #96 but my batch_size is a multiple of num_microbatches.

I can provide the full stack trace of the first error if needed. Thanks

giulbia avatar Apr 12 '20 22:04 giulbia

Yes please do provide the full stack trace.

galenmandrew avatar Apr 13 '20 17:04 galenmandrew

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-24-c6acfd19e5ca> in <module>
      3                     batch_size=batch_size,
      4                     epochs=15,
----> 5                     validation_data=(x_test_n, y_test_b))
      6 
      7 score = model.evaluate(x_test_n, y_test_b, verbose=0)

/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
    725         max_queue_size=max_queue_size,
    726         workers=workers,
--> 727         use_multiprocessing=use_multiprocessing)
    728 
    729   def evaluate(self,

/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_arrays.py in fit(self, model, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, **kwargs)
    673         validation_steps=validation_steps,
    674         validation_freq=validation_freq,
--> 675         steps_name='steps_per_epoch')
    676 
    677   def evaluate(self,

/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_arrays.py in model_iteration(model, inputs, targets, sample_weights, batch_size, epochs, verbose, callbacks, val_inputs, val_targets, val_sample_weights, shuffle, initial_epoch, steps_per_epoch, validation_steps, validation_freq, mode, validation_in_fit, prepared_feed_values_from_dataset, steps_name, **kwargs)
    392 
    393         # Get outputs.
--> 394         batch_outs = f(ins_batch)
    395         if not isinstance(batch_outs, list):
    396           batch_outs = [batch_outs]

/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/keras/backend.py in __call__(self, inputs)
   3474 
   3475     fetched = self._callable_fn(*array_vals,
-> 3476                                 run_metadata=self.run_metadata)
   3477     self._call_fetch_callbacks(fetched[-len(self._fetches):])
   3478     output_structure = nest.pack_sequence_as(

/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py in __call__(self, *args, **kwargs)
   1470         ret = tf_session.TF_SessionRunCallable(self._session._session,
   1471                                                self._handle, args,
-> 1472                                                run_metadata_ptr)
   1473         if run_metadata:
   1474           proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Input to reshape is a tensor with 28 values, but the requested shape requires a multiple of 16
	 [[{{node training/Reshape}}]]
	 [[loss_1/mul/_59]]
  (1) Invalid argument: Input to reshape is a tensor with 28 values, but the requested shape requires a multiple of 16
	 [[{{node training/Reshape}}]]
0 successful operations.
0 derived errors ignored.

giulbia avatar Apr 13 '20 20:04 giulbia

Hi, did you solve this?

Raymond-Xue avatar Jul 01 '20 17:07 Raymond-Xue

For some unknown reason, the number of total input samples should be a factor of batch size. So just as a temporarily solution, if you have 60,000 samples and batch size of 64, simply truncate your input to 59968 (=64* 937) samples. This solves the problem.

fereshteh-razmi avatar Aug 16 '21 18:08 fereshteh-razmi