keras-attention-augmented-convs icon indicating copy to clipboard operation
keras-attention-augmented-convs copied to clipboard

Memory leak?

Open lrizzello opened this issue 5 years ago • 3 comments

Hello and thank you for sharing your implementation,

Unfortunately, I'm running into memory issues when I try to use it on inputs that have more than 3 channels. Here's an example.

if __name__ == '__main__':
    x_size = 200
    y_size = 106
    n_data_dimensions = 20
    n_samples = 10
    batch_size = 1
    input = Input(shape=(y_size, x_size, n_data_dimensions))
    x = augmented_conv2d(input, filters=n_data_dimensions, kernel_size=(3, 3),
                         depth_k=0.2, depth_v=0.2,  # dk/v (0.2) * f_out (20) = 4
                         num_heads=4, relative_encodings=True)

    model = Model(input, x)
    model.compile(optimizer='adam', loss='mean_squared_error')
    #print("Expected memory usage", get_model_memory_usage(batch_size, model), "GB")
    model.summary()
    # Check if attention builds properly
    x = np.random.rand(n_samples, y_size, x_size, n_data_dimensions)
    y = model.fit(x, x, batch_size=batch_size, validation_split=0.1)

If you want to uncomment de print, add this extra function

def get_model_memory_usage(batch_size, model):
    import numpy as np
    from keras import backend as K

    shapes_mem_count = 0
    for l in model.layers:
        single_layer_mem = 1
        for s in l.output_shape:
            if s is None:
                continue
            single_layer_mem *= s
        shapes_mem_count += single_layer_mem

    trainable_count = np.sum([K.count_params(p) for p in set(model.trainable_weights)])
    non_trainable_count = np.sum([K.count_params(p) for p in set(model.non_trainable_weights)])

    number_size = 4.0
    if K.floatx() == 'float16':
         number_size = 2.0
    if K.floatx() == 'float64':
         number_size = 8.0

    total_memory = number_size*(batch_size*shapes_mem_count + trainable_count + non_trainable_count)
    gbytes = np.round(total_memory / (1024.0 ** 3), 3)
    return gbytes

According to this piece of code above, the model should be using 0.006GB of memory, but I get an OOM error with a GPU that has 11GB of RAM.

Here's some of the stacktrace

Using TensorFlow backend.
WARNING:tensorflow:From /home/usr-lin-ai/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
Expected memory usage 0.006 GB
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
input_1 (InputLayer)            (None, 106, 200, 20) 0
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 106, 200, 12) 252         input_1[0][0]
__________________________________________________________________________________________________
attention_augmentation2d_1 (Att (None, 106, 200, 4)  610         conv2d_2[0][0]
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 106, 200, 16) 2896        input_1[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D)               (None, 106, 200, 4)  20          attention_augmentation2d_1[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 106, 200, 20) 0           conv2d_1[0][0]
                                                                 conv2d_3[0][0]
==================================================================================================
Total params: 3,778
Trainable params: 3,778
Non-trainable params: 0
__________________________________________________________________________________________________
WARNING:tensorflow:From /home/usr-lin-ai/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Train on 9 samples, validate on 1 samples
Epoch 1/1
2019-09-24 13:56:38.401386: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiledto use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-09-24 13:56:38.423284: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3696000000 Hz

...
[A bunch of lines about running out of memory]
...

2019-09-24 13:56:50.585153: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 2 Chunks of size 1696000 totalling 3.23MiB
2019-09-24 13:56:50.585175: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 35955200 totalling 34.29MiB
2019-09-24 13:56:50.585197: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 67840000 totalling 64.70MiB
2019-09-24 13:56:50.585218: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 7191040000 totalling 6.70GiB
2019-09-24 13:56:50.585238: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Sum Total of in-use chunks: 6.80GiB
2019-09-24 13:56:50.585263: I tensorflow/core/common_runtime/bfc_allocator.cc:647] Stats:
Limit:                 10989682688
InUse:                  7301049600
MaxInUse:               7613445120
NumAllocs:                     137
MaxAllocSize:           7191040000

2019-09-24 13:56:50.585313: W tensorflow/core/common_runtime/bfc_allocator.cc:271] ******************************************************************_**_______________________________
2019-09-24 13:56:50.585343: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at tile_ops.cc:124 : Resource exhausted: OOM when allocating tensor with shape[1,4,106,106,200,200] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "ai/transaction_document_parsing/InvoiceNetwork.py", line 553, in <module>
    y = model.fit(x, x, batch_size=batch_size, validation_split=0.1)
  File "/home/usr-lin-ai/anaconda3/lib/python3.7/site-packages/keras/engine/training.py", line 1039, in fit
    validation_steps=validation_steps)
  File "/home/usr-lin-ai/anaconda3/lib/python3.7/site-packages/keras/engine/training_arrays.py", line 199, in fit_loop
    outs = f(ins_batch)
  File "/home/usr-lin-ai/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
    return self._call(inputs)
  File "/home/usr-lin-ai/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
    fetched = self._callable_fn(*array_vals)
  File "/home/usr-lin-ai/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
    run_metadata_ptr)
  File "/home/usr-lin-ai/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,4,200,200,106,106] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node attention_augmentation2d_1/Tile_1}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

         [[{{node loss/mul}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Have you run into this problem? And if so, have you found a way to solve it?

lrizzello avatar Sep 24 '19 12:09 lrizzello

This isn't a memory leak as much as it is an inefficient implementation. The weights and parameters are not consuming the vast majority of the memory, but the intermediate products inside the computation. Oom occurs due to allocation of shape[1,4,106,106,200,200].

I will have to look into itvv

titu1994 avatar Sep 24 '19 13:09 titu1994

Alright, I see.

Thanks for your response

lrizzello avatar Sep 25 '19 08:09 lrizzello

Did you solve the problem of ResourceExhaustedError?

ResourceExhaustedError: 2 root error(s) found. (0) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[32,4,4096,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node attention_augmentation2d_7/MatMul}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

 [[batch_normalization_7/cond/else/_365/FusedBatchNormV3/_1465]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

(1) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[32,4,4096,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node attention_augmentation2d_7/MatMul}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

0 successful operations. 0 derived errors ignored.

tuandv2021 avatar Mar 18 '22 01:03 tuandv2021