keras-attention-augmented-convs
keras-attention-augmented-convs copied to clipboard
Memory leak?
Hello and thank you for sharing your implementation,
Unfortunately, I'm running into memory issues when I try to use it on inputs that have more than 3 channels. Here's an example.
if __name__ == '__main__':
x_size = 200
y_size = 106
n_data_dimensions = 20
n_samples = 10
batch_size = 1
input = Input(shape=(y_size, x_size, n_data_dimensions))
x = augmented_conv2d(input, filters=n_data_dimensions, kernel_size=(3, 3),
depth_k=0.2, depth_v=0.2, # dk/v (0.2) * f_out (20) = 4
num_heads=4, relative_encodings=True)
model = Model(input, x)
model.compile(optimizer='adam', loss='mean_squared_error')
#print("Expected memory usage", get_model_memory_usage(batch_size, model), "GB")
model.summary()
# Check if attention builds properly
x = np.random.rand(n_samples, y_size, x_size, n_data_dimensions)
y = model.fit(x, x, batch_size=batch_size, validation_split=0.1)
If you want to uncomment de print, add this extra function
def get_model_memory_usage(batch_size, model):
import numpy as np
from keras import backend as K
shapes_mem_count = 0
for l in model.layers:
single_layer_mem = 1
for s in l.output_shape:
if s is None:
continue
single_layer_mem *= s
shapes_mem_count += single_layer_mem
trainable_count = np.sum([K.count_params(p) for p in set(model.trainable_weights)])
non_trainable_count = np.sum([K.count_params(p) for p in set(model.non_trainable_weights)])
number_size = 4.0
if K.floatx() == 'float16':
number_size = 2.0
if K.floatx() == 'float64':
number_size = 8.0
total_memory = number_size*(batch_size*shapes_mem_count + trainable_count + non_trainable_count)
gbytes = np.round(total_memory / (1024.0 ** 3), 3)
return gbytes
According to this piece of code above, the model should be using 0.006GB of memory, but I get an OOM error with a GPU that has 11GB of RAM.
Here's some of the stacktrace
Using TensorFlow backend.
WARNING:tensorflow:From /home/usr-lin-ai/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
Expected memory usage 0.006 GB
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 106, 200, 20) 0
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, 106, 200, 12) 252 input_1[0][0]
__________________________________________________________________________________________________
attention_augmentation2d_1 (Att (None, 106, 200, 4) 610 conv2d_2[0][0]
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, 106, 200, 16) 2896 input_1[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, 106, 200, 4) 20 attention_augmentation2d_1[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, 106, 200, 20) 0 conv2d_1[0][0]
conv2d_3[0][0]
==================================================================================================
Total params: 3,778
Trainable params: 3,778
Non-trainable params: 0
__________________________________________________________________________________________________
WARNING:tensorflow:From /home/usr-lin-ai/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Train on 9 samples, validate on 1 samples
Epoch 1/1
2019-09-24 13:56:38.401386: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiledto use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-09-24 13:56:38.423284: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3696000000 Hz
...
[A bunch of lines about running out of memory]
...
2019-09-24 13:56:50.585153: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 2 Chunks of size 1696000 totalling 3.23MiB
2019-09-24 13:56:50.585175: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 35955200 totalling 34.29MiB
2019-09-24 13:56:50.585197: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 67840000 totalling 64.70MiB
2019-09-24 13:56:50.585218: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 7191040000 totalling 6.70GiB
2019-09-24 13:56:50.585238: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Sum Total of in-use chunks: 6.80GiB
2019-09-24 13:56:50.585263: I tensorflow/core/common_runtime/bfc_allocator.cc:647] Stats:
Limit: 10989682688
InUse: 7301049600
MaxInUse: 7613445120
NumAllocs: 137
MaxAllocSize: 7191040000
2019-09-24 13:56:50.585313: W tensorflow/core/common_runtime/bfc_allocator.cc:271] ******************************************************************_**_______________________________
2019-09-24 13:56:50.585343: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at tile_ops.cc:124 : Resource exhausted: OOM when allocating tensor with shape[1,4,106,106,200,200] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File "ai/transaction_document_parsing/InvoiceNetwork.py", line 553, in <module>
y = model.fit(x, x, batch_size=batch_size, validation_split=0.1)
File "/home/usr-lin-ai/anaconda3/lib/python3.7/site-packages/keras/engine/training.py", line 1039, in fit
validation_steps=validation_steps)
File "/home/usr-lin-ai/anaconda3/lib/python3.7/site-packages/keras/engine/training_arrays.py", line 199, in fit_loop
outs = f(ins_batch)
File "/home/usr-lin-ai/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
return self._call(inputs)
File "/home/usr-lin-ai/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "/home/usr-lin-ai/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
run_metadata_ptr)
File "/home/usr-lin-ai/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,4,200,200,106,106] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node attention_augmentation2d_1/Tile_1}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[{{node loss/mul}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Have you run into this problem? And if so, have you found a way to solve it?
This isn't a memory leak as much as it is an inefficient implementation. The weights and parameters are not consuming the vast majority of the memory, but the intermediate products inside the computation. Oom occurs due to allocation of shape[1,4,106,106,200,200].
I will have to look into itvv
Alright, I see.
Thanks for your response
Did you solve the problem of ResourceExhaustedError?
ResourceExhaustedError: 2 root error(s) found. (0) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[32,4,4096,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node attention_augmentation2d_7/MatMul}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[[batch_normalization_7/cond/else/_365/FusedBatchNormV3/_1465]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
(1) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[32,4,4096,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node attention_augmentation2d_7/MatMul}}]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
0 successful operations. 0 derived errors ignored.