ssd_keras icon indicating copy to clipboard operation
ssd_keras copied to clipboard

ValueError: Layer model expects 1 input(s), but it received 2 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, None, None, None) dtype=uint8>, <tf.Tensor 'IteratorGetNext:1' shape=(None, None, None) dtype=float32>]

Open suprateembanerjee opened this issue 3 years ago • 23 comments

Tensorflow V2 (latest) Keras (latest) ssd300_training.ipynb

I have managed to convert most of the V1 code to V2 and successfully run it. I have made changes to all the python files as necessary too. However, this issue occurs on the line

history = model.fit_generator(generator=train_generator, steps_per_epoch=steps_per_epoch, epochs=final_epoch, callbacks=callbacks, validation_data=val_generator, validation_steps=ceil(val_dataset_size/batch_size), initial_epoch=initial_epoch)

Entire error:


Epoch 1/120

Epoch 00001: LearningRateScheduler reducing learning rate to 0.001.

ValueError Traceback (most recent call last) in 4 steps_per_epoch = 1000 5 ----> 6 history = model.fit_generator(generator=train_generator, 7 steps_per_epoch=steps_per_epoch, 8 epochs=final_epoch,

c:\users\dolphin48.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\keras\engine\training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch) 1844 'will be removed in a future version. ' 1845 'Please use Model.fit, which supports generators.') -> 1846 return self.fit( 1847 generator, 1848 steps_per_epoch=steps_per_epoch,

c:\users\dolphin48.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing) 1097 _r=1): 1098 callbacks.on_train_batch_begin(step) -> 1099 tmp_logs = self.train_function(iterator) 1100 if data_handler.should_sync: 1101 context.async_wait()

c:\users\dolphin48.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\eager\def_function.py in call(self, *args, **kwds) 782 tracing_count = self.experimental_get_tracing_count() 783 with trace.Trace(self._name) as tm: --> 784 result = self._call(*args, **kwds) 785 compiler = "xla" if self._experimental_compile else "nonXla" 786 new_tracing_count = self.experimental_get_tracing_count()

c:\users\dolphin48.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\eager\def_function.py in _call(self, *args, **kwds) 825 # This is the first call of call, so we have to initialize. 826 initializers = [] --> 827 self._initialize(args, kwds, add_initializers_to=initializers) 828 finally: 829 # At this point we know that the initialization is complete (or less

c:\users\dolphin48.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\eager\def_function.py in _initialize(self, args, kwds, add_initializers_to) 679 self._graph_deleter = FunctionDeleter(self._lifted_initializer_graph) 680 self._concrete_stateful_fn = ( --> 681 self._stateful_fn._get_concrete_function_internal_garbage_collected( # pylint: disable=protected-access 682 *args, **kwds)) 683

c:\users\dolphin48.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\eager\function.py in _get_concrete_function_internal_garbage_collected(self, *args, **kwargs) 2995 args, kwargs = None, None 2996 with self._lock: -> 2997 graph_function, _ = self._maybe_define_function(args, kwargs) 2998 return graph_function 2999

c:\users\dolphin48.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\eager\function.py in _maybe_define_function(self, args, kwargs) 3387 3388 self._function_cache.missed.add(call_context_key) -> 3389 graph_function = self._create_graph_function(args, kwargs) 3390 self._function_cache.primary[cache_key] = graph_function 3391

c:\users\dolphin48.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\eager\function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes) 3222 arg_names = base_arg_names + missing_arg_names 3223 graph_function = ConcreteFunction( -> 3224 func_graph_module.func_graph_from_py_func( 3225 self._name, 3226 self._python_function,

c:\users\dolphin48.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\framework\func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes) 994 _, original_func = tf_decorator.unwrap(python_func) 995 --> 996 func_outputs = python_func(*func_args, **func_kwargs) 997 998 # invariant: func_outputs contains only Tensors, CompositeTensors,

c:\users\dolphin48.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\eager\def_function.py in wrapped_fn(*args, **kwds) 588 xla_context.Exit() 589 else: --> 590 out = weak_wrapped_fn().wrapped(*args, **kwds) 591 return out 592

c:\users\dolphin48.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\framework\func_graph.py in wrapper(*args, **kwargs) 981 except Exception as e: # pylint:disable=broad-except 982 if hasattr(e, "ag_error_metadata"): --> 983 raise e.ag_error_metadata.to_exception(e) 984 else: 985 raise

ValueError: in user code:

c:\users\dolphin48\.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\keras\engine\training.py:804 train_function  *
    return step_function(self, iterator)
c:\users\dolphin48\.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\keras\engine\training.py:794 step_function  **
    outputs = model.distribute_strategy.run(run_step, args=(data,))
c:\users\dolphin48\.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:1259 run
    return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
c:\users\dolphin48\.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:2730 call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
c:\users\dolphin48\.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:3417 _call_for_each_replica
    return fn(*args, **kwargs)
c:\users\dolphin48\.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\keras\engine\training.py:787 run_step  **
    outputs = model.train_step(data)
c:\users\dolphin48\.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\keras\engine\training.py:753 train_step
    y_pred = self(x, training=True)
c:\users\dolphin48\.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\keras\engine\base_layer.py:1000 __call__
    input_spec.assert_input_compatibility(self.input_spec, inputs, self.name)
c:\users\dolphin48\.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\keras\engine\input_spec.py:204 assert_input_compatibility
    raise ValueError('Layer ' + layer_name + ' expects ' +

ValueError: Layer model expects 1 input(s), but it received 2 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, None, None, None) dtype=uint8>, <tf.Tensor 'IteratorGetNext:1' shape=(None, None, None) dtype=float32>]

This Stackoverflow post (https://stackoverflow.com/questions/61586981/valueerror-layer-sequential-20-expects-1-inputs-but-it-received-2-input-tensor#) suggests it has something to do with fit() parameter validation_data. It points to a change in structural requirements, which has been changed from lists to tuples across tfv1.x and tfv2.x. However, we are not using a structure at all, but a generator to accomplish our task. I don't understand what is going wrong.

suprateembanerjee avatar Apr 30 '21 08:04 suprateembanerjee

Hi, thank you @suprateem48 for bringing this up. I'm actually facing the same issue. Please find my stack trace for reference below. Advice from anybody how too solve that issue is highly appreciated - thank you in advance!

existing dataset files found -> loading.... Loading labels: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16551/16551 [00:03<00:00, 4725.84it/s] Loading image IDs: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16551/16551 [00:01<00:00, 9289.97it/s] Loading evaluation-neutrality annotations: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16551/16551 [00:02<00:00, 7384.34it/s] Loading labels: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4952/4952 [00:01<00:00, 4734.14it/s] Loading image IDs: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4952/4952 [00:00<00:00, 9295.89it/s] Loading evaluation-neutrality annotations: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4952/4952 [00:00<00:00, 7491.51it/s] Number of images in the training dataset: 16551 Number of images in the validation dataset: 4952 2021-05-03 07:25:05.965847: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2) Epoch 1/120

Epoch 00001: LearningRateScheduler reducing learning rate to 0.001. Traceback (most recent call last): File "D:\projects\python\ssd_test\ssd_test.py", line 288, in use_multiprocessing=False) File "C:\Program Files\Python\Python37\lib\site-packages\tensorflow\python\keras\engine\training.py", line 1100, in fit tmp_logs = self.train_function(iterator) File "C:\Program Files\Python\Python37\lib\site-packages\tensorflow\python\eager\def_function.py", line 828, in call result = self._call(*args, **kwds) File "C:\Program Files\Python\Python37\lib\site-packages\tensorflow\python\eager\def_function.py", line 871, in _call self._initialize(args, kwds, add_initializers_to=initializers) File "C:\Program Files\Python\Python37\lib\site-packages\tensorflow\python\eager\def_function.py", line 726, in _initialize *args, **kwds)) File "C:\Program Files\Python\Python37\lib\site-packages\tensorflow\python\eager\function.py", line 2969, in _get_concrete_function_internal_garbage_collected graph_function, _ = self._maybe_define_function(args, kwargs) File "C:\Program Files\Python\Python37\lib\site-packages\tensorflow\python\eager\function.py", line 3361, in _maybe_define_function graph_function = self._create_graph_function(args, kwargs) File "C:\Program Files\Python\Python37\lib\site-packages\tensorflow\python\eager\function.py", line 3206, in _create_graph_function capture_by_value=self._capture_by_value), File "C:\Program Files\Python\Python37\lib\site-packages\tensorflow\python\framework\func_graph.py", line 990, in func_graph_from_py_func func_outputs = python_func(*func_args, **func_kwargs) File "C:\Program Files\Python\Python37\lib\site-packages\tensorflow\python\eager\def_function.py", line 634, in wrapped_fn out = weak_wrapped_fn().wrapped(*args, **kwds) File "C:\Program Files\Python\Python37\lib\site-packages\tensorflow\python\framework\func_graph.py", line 977, in wrapper raise e.ag_error_metadata.to_exception(e) ValueError: in user code:

C:\Program Files\Python\Python37\lib\site-packages\tensorflow\python\keras\engine\training.py:805 train_function  *
    return step_function(self, iterator)
C:\Program Files\Python\Python37\lib\site-packages\tensorflow\python\keras\engine\training.py:795 step_function  **
    outputs = model.distribute_strategy.run(run_step, args=(data,))
C:\Program Files\Python\Python37\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:1259 run
    return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
C:\Program Files\Python\Python37\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:2730 call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
C:\Program Files\Python\Python37\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:3417 _call_for_each_replica
    return fn(*args, **kwargs)
C:\Program Files\Python\Python37\lib\site-packages\tensorflow\python\keras\engine\training.py:788 run_step  **
    outputs = model.train_step(data)
C:\Program Files\Python\Python37\lib\site-packages\tensorflow\python\keras\engine\training.py:754 train_step
    y_pred = self(x, training=True)
C:\Program Files\Python\Python37\lib\site-packages\tensorflow\python\keras\engine\base_layer.py:998 __call__
    input_spec.assert_input_compatibility(self.input_spec, inputs, self.name)
C:\Program Files\Python\Python37\lib\site-packages\tensorflow\python\keras\engine\input_spec.py:207 assert_input_compatibility
    ' input tensors. Inputs received: ' + str(inputs))

ValueError: Layer model expects 1 input(s), but it received 2 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, None, None, None) dtype=uint8>, <tf.Tensor 'IteratorGetNext:1' shape=(None, None, None) dtype=float32>]

ukowa avatar May 03 '21 05:05 ukowa

Hello. I had the same problem and I racked my brain these days. Finally, I use validation_data=tuple(val_generator), instead of validation_data=val_generator, The error has been solved. But I run out my memory (Google Colab Free Version) and looking for another environment.

By the way, in my case, the command history = model.fit_generator(generator=train_generator, doesn't work anymore, I have to use history = model.fit(train_generator,

bfhaha avatar May 12 '21 04:05 bfhaha

@bfhaha I tried your fix, but it did not solve the issue for me. Running the code `initial_epoch = 0 final_epoch = 120 steps_per_epoch = 1000

history = model.fit(x=train_generator, steps_per_epoch=steps_per_epoch, epochs=final_epoch, callbacks=callbacks, validation_data=tuple(val_generator), validation_steps=ceil(val_dataset_size/batch_size), initial_epoch=initial_epoch)`

still results in

ValueError Traceback (most recent call last) in 3 steps_per_epoch = 1000 4 ----> 5 history = model.fit(x=train_generator, 6 steps_per_epoch=steps_per_epoch, 7 epochs=final_epoch,

c:\users\dolphin48.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing) 1097 _r=1): 1098 callbacks.on_train_batch_begin(step) -> 1099 tmp_logs = self.train_function(iterator) 1100 if data_handler.should_sync: 1101 context.async_wait()

c:\users\dolphin48.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\eager\def_function.py in call(self, *args, **kwds) 782 tracing_count = self.experimental_get_tracing_count() 783 with trace.Trace(self._name) as tm: --> 784 result = self._call(*args, **kwds) 785 compiler = "xla" if self._experimental_compile else "nonXla" 786 new_tracing_count = self.experimental_get_tracing_count()

c:\users\dolphin48.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\eager\def_function.py in _call(self, *args, **kwds) 816 # In this case we have not created variables on the first call. So we can 817 # run the first trace but we should fail if variables are created. --> 818 results = self._stateful_fn(*args, **kwds) 819 if self._created_variables: 820 raise ValueError("Creating variables on a non-first call to a function"

c:\users\dolphin48.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\eager\function.py in call(self, *args, **kwargs) 2967 with self._lock: 2968 (graph_function, -> 2969 filtered_flat_args) = self._maybe_define_function(args, kwargs) 2970 return graph_function._call_flat( 2971 filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access

c:\users\dolphin48.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\eager\function.py in _maybe_define_function(self, args, kwargs) 3383 self.input_signature is None and 3384 call_context_key in self._function_cache.missed): -> 3385 return self._define_function_with_shape_relaxation( 3386 args, kwargs, flat_args, filtered_flat_args, cache_key_context) 3387

c:\users\dolphin48.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\eager\function.py in _define_function_with_shape_relaxation(self, args, kwargs, flat_args, filtered_flat_args, cache_key_context) 3305 expand_composites=True) 3306 -> 3307 graph_function = self._create_graph_function( 3308 args, kwargs, override_flat_arg_shapes=relaxed_arg_shapes) 3309 self._function_cache.arg_relaxed[rank_only_cache_key] = graph_function

c:\users\dolphin48.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\eager\function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes) 3222 arg_names = base_arg_names + missing_arg_names 3223 graph_function = ConcreteFunction( -> 3224 func_graph_module.func_graph_from_py_func( 3225 self._name, 3226 self._python_function,

c:\users\dolphin48.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\framework\func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes) 994 _, original_func = tf_decorator.unwrap(python_func) 995 --> 996 func_outputs = python_func(*func_args, **func_kwargs) 997 998 # invariant: func_outputs contains only Tensors, CompositeTensors,

c:\users\dolphin48.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\eager\def_function.py in wrapped_fn(*args, **kwds) 588 xla_context.Exit() 589 else: --> 590 out = weak_wrapped_fn().wrapped(*args, **kwds) 591 return out 592

c:\users\dolphin48.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\framework\func_graph.py in wrapper(*args, **kwargs) 981 except Exception as e: # pylint:disable=broad-except 982 if hasattr(e, "ag_error_metadata"): --> 983 raise e.ag_error_metadata.to_exception(e) 984 else: 985 raise

ValueError: in user code:

c:\users\dolphin48\.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\keras\engine\training.py:804 train_function  *
    return step_function(self, iterator)
c:\users\dolphin48\.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\keras\engine\training.py:794 step_function  **
    outputs = model.distribute_strategy.run(run_step, args=(data,))
c:\users\dolphin48\.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:1259 run
    return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
c:\users\dolphin48\.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:2730 call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
c:\users\dolphin48\.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:3417 _call_for_each_replica
    return fn(*args, **kwargs)
c:\users\dolphin48\.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\keras\engine\training.py:787 run_step  **
    outputs = model.train_step(data)
c:\users\dolphin48\.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\keras\engine\training.py:753 train_step
    y_pred = self(x, training=True)
c:\users\dolphin48\.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\keras\engine\base_layer.py:1000 __call__
    input_spec.assert_input_compatibility(self.input_spec, inputs, self.name)
c:\users\dolphin48\.conda\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\keras\engine\input_spec.py:204 assert_input_compatibility
    raise ValueError('Layer ' + layer_name + ' expects ' +

ValueError: Layer model expects 1 input(s), but it received 2 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, None, None, None) dtype=uint8>, <tf.Tensor 'IteratorGetNext:1' shape=(None, None, None) dtype=float32>]

suprateembanerjee avatar May 12 '21 05:05 suprateembanerjee

@suprateem48 Sorry. I really don't know where is the problem. If I were you, I would try the commands print(val_generator) # It is supposed to be <generator object DataGenerator.generate at 0x7f33b9691a50> and print(tuple(val_generator)) # It is supposed to be () after defining val_generator to observe the difference.

bfhaha avatar May 12 '21 17:05 bfhaha

@bfhaha Weird thing, I managed to reproduce your memory issue even on my 8GB RTX 2070 Super, but this error is given only for the first time the kernel runs model.fit(). Every consecutive time model.fit() is rerun on the same kernel, it throws the old tuple-related error.

suprateembanerjee avatar May 13 '21 02:05 suprateembanerjee

@bfhaha thanks for your fix. I've tried it as well. Same here: memory error (32GB RAM Predator, GForce 1070). I gave it a second try with a reduced data set of just 8 images, but same result. I know it doesn't really help, but I just wanted to share the information...

ukowa avatar May 13 '21 08:05 ukowa

Hello. I have tried to run the notebook on a Google Compute Engine (E2 series, e2-highmem-16, 16vCPU, 128 GB memory) 80 GB disk. It also crashed... I was running ssd7_training.ipynb, not ssd300.

bfhaha avatar May 14 '21 15:05 bfhaha

I had rented a Google Compute Engine (N2 series, custom 8 vCPU, 640 GB memory, 200 GB Disk) yesterday and showed the following message after running one hour.

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-24-fb2ae06e0e3c> in <module>
      9                               epochs=final_epoch,
     10                               callbacks=callbacks,
---> 11                               validation_data=tuple(val_generator),
     12                               validation_steps=ceil(val_dataset_size/batch_size),
     13                               initial_epoch=initial_epoch)

~/data_generator/object_detection_2d_data_generator.py in generate(self, batch_size, shuffle, transformations, label_encoder, returns, keep_images_without_gt, degenerate_box_handling)
   1149                     batch_y_encoded, batch_matched_anchors = label_encoder(batch_y, diagnostics=True)
   1150                 else:
-> 1151                     batch_y_encoded = label_encoder(batch_y, diagnostics=False)
   1152                     batch_matched_anchors = None
   1153 

~/ssd_encoder_decoder/ssd_input_encoder.py in __call__(self, ground_truth_labels, diagnostics)
    311         ##################################################################################
    312 
--> 313         y_encoded = self.generate_encoding_template(batch_size=batch_size, diagnostics=False)
    314 
    315         ##################################################################################

~/ssd_encoder_decoder/ssd_input_encoder.py in generate_encoding_template(self, batch_size, diagnostics)
    604         #    shape as the SSD model output tensor. The content of this tensor is irrelevant, we'll just use
    605         #    `boxes_tensor` a second time.
--> 606         y_encoding_template = np.concatenate((classes_tensor, boxes_tensor, boxes_tensor, variances_tensor), axis=2)
    607 
    608         if diagnostics:

<__array_function__ internals> in concatenate(*args, **kwargs)

MemoryError: Unable to allocate 25.7 MiB for an array with shape (16, 11692, 18) and data type float64

bfhaha avatar May 16 '21 16:05 bfhaha

@bfhaha Yes,this is the exact Memory related issue I faced as well.

suprateembanerjee avatar May 16 '21 16:05 suprateembanerjee

@suprateem48 Have you ever tried this solution? https://stackoverflow.com/questions/57507832/unable-to-allocate-array-with-shape-and-data-type I don't have enough money to rent the VM instance to test again.

bfhaha avatar May 17 '21 08:05 bfhaha

Same issue right here, will update if I can find something.

Edit: It seems like calling next(val_generator) is infinite ? Not quite sure why. But calling tuple() on an infinite generator will cause a memory error.

JuliusJacobitz avatar May 22 '21 09:05 JuliusJacobitz

By the way, in my case, the command history = model.fit_generator(generator=train_generator, doesn't work anymore, I have to use history = model.fit(train_generator, @bfhaha could you show me how you used model.fit instead of model.fit_generator ? :)

JuliusJacobitz avatar May 22 '21 09:05 JuliusJacobitz

@JuliusJacobitz Sorry. I don't understand what you mean "how" I used model.fit. It showed the following message when I was using history = model.fit_generator(generator=train_generator UserWarning: Model.fit_generator is deprecated and will be removed in a future version. Please use Model.fit, which supports generators.

So I just changed model.fit_generator to model.fit The example at here shows that we don't have to use generator= if we use model.fit

bfhaha avatar May 23 '21 08:05 bfhaha

Hi, I had struggled same issue.

The reason was that the return of the data_generator was [batch_X, batch_y_encoded]. I changed the DataGenerator class, generate() in object_detection_2d_data_generator.py to the following. Instead of returning a list, it returns two returns, batch_X and batch_y_encoded. It's a primitive solution, but it works fine.

  #########################################################################################
  # Compose the output.
  #########################################################################################
  
  ret = []
  if 'processed_images' in returns: ret.append(batch_X)
  if 'encoded_labels' in returns: ret.append(batch_y_encoded)
  if 'matched_anchors' in returns: ret.append(batch_matched_anchors)
  if 'processed_labels' in returns: ret.append(batch_y)
  if 'filenames' in returns: ret.append(batch_filenames)
  if 'image_ids' in returns: ret.append(batch_image_ids)
  if 'evaluation-neutral' in returns: ret.append(batch_eval_neutral)
  if 'inverse_transform' in returns: ret.append(batch_inverse_transforms)
  if 'original_images' in returns: ret.append(batch_original_images)
  if 'original_labels' in returns: ret.append(batch_original_labels)
  
  yield batch_X, batch_y_encoded # do not yield ret

FYI: Here is my model.fit.

history = model.fit(train_generator,
                    steps_per_epoch=ceil(train_dataset_size/batch_size),
                    epochs=final_epoch,
                    callbacks=callbacks,
                    validation_data=val_generator,
                    validation_steps=ceil(val_dataset_size/batch_size),
                    initial_epoch=initial_epoch,
                    verbose=1)

pirolone888 avatar May 31 '21 00:05 pirolone888

@pirolone888 Thanks. But it showed UnboundLocalError: local variable 'batch_X' referenced before assignment when I was trying your method.

bfhaha avatar Jun 04 '21 12:06 bfhaha

Hello, I also converted the code from tensorflow 1.x to tensorflow 2.4. I fixed the problem you are having by changing the DataGenerator in object_detection_2d_data_generator.py as such:

ret = [] if 'processed_images' in returns: ret.append(batch_X) if 'encoded_labels' in returns: ret.append(batch_y_encoded) if 'matched_anchors' in returns: ret.append(batch_matched_anchors) if 'processed_labels' in returns: ret.append(batch_y) if 'filenames' in returns: ret.append(batch_filenames) if 'image_ids' in returns: ret.append(batch_image_ids) if 'evaluation-neutral' in returns: ret.append(batch_eval_neutral) if 'inverse_transform' in returns: ret.append(batch_inverse_transforms) if 'original_images' in returns: ret.append(batch_original_images) if 'original_labels' in returns: ret.append(batch_original_labels)

        yield tuple(ret)

I simply changed yield ret to yield tuple(ret).

daviddanialy avatar Jun 14 '21 06:06 daviddanialy

@daviddanialy Thanks. So just place the following code under the function generate in object_detection_2d_data_generator.py? (It has been indented by eight spaces.)

        ret = []
        if 'processed_images' in returns: ret.append(batch_X)
        if 'encoded_labels' in returns: ret.append(batch_y_encoded)
        if 'matched_anchors' in returns: ret.append(batch_matched_anchors)
        if 'processed_labels' in returns: ret.append(batch_y)
        if 'filenames' in returns: ret.append(batch_filenames)
        if 'image_ids' in returns: ret.append(batch_image_ids)
        if 'evaluation-neutral' in returns: ret.append(batch_eval_neutral)
        if 'inverse_transform' in returns: ret.append(batch_inverse_transforms)
        if 'original_images' in returns: ret.append(batch_original_images)
        if 'original_labels' in returns: ret.append(batch_original_labels)

        yield tuple(ret)

It still showed the original error message (Layer model expects 1 input(s), but it received 2 input tensors...).

I have already given up trying this project and trying matterport's mask rcnn for object detection.

bfhaha avatar Jun 14 '21 09:06 bfhaha

That code is already in the generate function, you just change yield ret to yield tuple(ret). I may have to switch to a different repo as well, because I'm having issues with the predictions not ever exceeding the confidence threshold.

daviddanialy avatar Jun 14 '21 10:06 daviddanialy

@daviddanialy Thanks. It doesn't work for me.

bfhaha avatar Jun 14 '21 10:06 bfhaha

Any solutions? Here is my code:

# TODO: Set the epochs to train for.
# If you're resuming a previous training, set `initial_epoch` and `final_epoch` accordingly.
initial_epoch   = 0
final_epoch     = 20
steps_per_epoch = 1000

history = model.fit(train_generator,
                    steps_per_epoch=steps_per_epoch,
                    epochs=final_epoch,
                    callbacks=callbacks,
                    validation_data=val_generator,
                    validation_steps=ceil(val_dataset_size/batch_size),
                    initial_epoch=initial_epoch)
            ret = []
            if 'processed_images' in returns: ret.append(batch_X)
            if 'encoded_labels' in returns: ret.append(batch_y_encoded)
            if 'matched_anchors' in returns: ret.append(batch_matched_anchors)
            if 'processed_labels' in returns: ret.append(batch_y)
            if 'filenames' in returns: ret.append(batch_filenames)
            if 'image_ids' in returns: ret.append(batch_image_ids)
            if 'evaluation-neutral' in returns: ret.append(batch_eval_neutral)
            if 'inverse_transform' in returns: ret.append(batch_inverse_transforms)
            if 'original_images' in returns: ret.append(batch_original_images)
            if 'original_labels' in returns: ret.append(batch_original_labels)

            yield ret

I have tried ret, tuple(ret), [ret], and I still get the following:

ValueError: Layer model expects 1 input(s), but it received 2 input tensors. Inputs received: [<tf.Tensor 'IteratorGetNext:0' shape=(None, None, None, None) dtype=uint8>, <tf.Tensor 'IteratorGetNext:1' shape=(None, None, None) dtype=float32>]

Hrrsmjd avatar Oct 26 '21 08:10 Hrrsmjd

Hello. I had the same problem and I racked my brain these days. Finally, I use validation_data=tuple(val_generator), instead of validation_data=val_generator, The error has been solved. But I run out my memory (Google Colab Free Version) and looking for another environment.

By the way, in my case, the command history = model.fit_generator(generator=train_generator, doesn't work anymore, I have to use history = model.fit(train_generator,

Tried that. Not seem to work for me...

jy0821 avatar Mar 01 '22 19:03 jy0821

I solved the same issue for model.predict() for case when model has 2 inputs.

My generator ouput was: return (a, b)

I changed it to: return ((a, b), None)

ZFTurbo avatar Mar 20 '22 15:03 ZFTurbo

hello! since the model.fit and model.fit_generator are essentially functions for the repeated loop over several epochs, I abandoned using it and, instead, used customized for loop and enumerating the generated dataset (i mostly use pytorch, therefore, it is more convenient for me to for loop).

Figure 1 Screen Shot 2023-01-12 at 11 41 16 AM

Above, I am using customized dataset generator (I modified this python code: https://github.com/wjddyd66/Tensorflow2.0/blob/master/SSD/voc_data.py), where I send generated dataset to the training code below:

Screen Shot 2023-01-12 at 11 42 00 AM

this way I was able to fix the problem above. Hope it helps to you guys!

generalMG avatar Jan 12 '23 02:01 generalMG