efficientnet icon indicating copy to clipboard operation
efficientnet copied to clipboard

keras transfer learning - retrain EfficientNetB7 Tesla V100 32G out of memory

Open yisampi opened this issue 4 years ago • 1 comments

2020-05-22 09:51:58.789020: W tensorflow/core/common_runtime/bfc_allocator.cc:271] **************************************************************************************************** 2020-05-22 09:51:58.789068: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at conv_ops.cc:446 : Resource exhausted: OOM when allocating tensor with shape[32,3840,10,10] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc OOM when allocating tensor with shape[32,3840,10,10] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node block7c_expand_conv/convolution}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[{{node loss/mul}}]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Traceback (most recent call last): File "train.py", line 72, in train() File "train.py", line 58, in train model.train() File "/data/sam.yi/keras-transfer-learning-testing/models/base_model.py", line 69, in train self._fine_tuning() File "/data/sam.yi/keras-transfer-learning-testing/models/efficientnetb7.py", line 76, in _fine_tuning class_weight=self.class_weight) File "/data/anaconda3/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper return func(*args, **kwargs) File "/data/anaconda3/lib/python3.6/site-packages/keras/engine/training.py", line 1418, in fit_generator initial_epoch=initial_epoch) File "/data/anaconda3/lib/python3.6/site-packages/keras/engine/training_generator.py", line 217, in fit_generator class_weight=class_weight) File "/data/anaconda3/lib/python3.6/site-packages/keras/engine/training.py", line 1217, in train_on_batch outputs = self.train_function(ins) File "/data/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2715, in call return self._call(inputs) File "/data/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call fetched = self._callable_fn(*array_vals) File "/data/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1439, in call run_metadata_ptr) File "/data/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[32,3840,10,10] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node block7c_expand_conv/convolution}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[{{node loss/mul}}]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

keras:2.2.4 tensorflow-gpu :1.13.1 efficientnet: 1.1.0 batch_size: 1 image_size: 512x512

yisampi avatar May 22 '20 02:05 yisampi

I met the same problem, have you solved it?

wangdomg avatar Nov 16 '20 11:11 wangdomg