Process get killed when running tensorflow.python.ops.ctc_ops._state_to_olabel
Click to expand!
Issue Type
Bug
Have you reproduced the bug with TF nightly?
No
Source
binary
Tensorflow Version
2.11.0
Custom Code
Yes
OS Platform and Distribution
Ubuntu 22.04
Mobile device
No response
Python version
3.9
Bazel version
No response
GCC/Compiler version
No response
CUDA/cuDNN version
No response
GPU model and memory
No response
Current Behaviour?
Due to large tensor.
Standalone code to reproduce the issue
import tensorflow as tf
import numpy as np
from tensorflow.python.ops import ctc_ops
try:
try:
with tf.device('/CPU'):
arg_0_tensor = tf.random.uniform([2, 4], minval=-256, maxval=257, dtype=tf.int32)
arg_0 = tf.identity(arg_0_tensor)
arg_1 = 125091515651
arg_2_tensor = tf.random.uniform([3, 2, 10], dtype=tf.float32)
arg_2 = tf.identity(arg_2_tensor)
out = ctc_ops._state_to_olabel(arg_0,arg_1,arg_2,)
except Exception as e:
print("Error:"+str(e))
try:
with tf.device('/GPU:0'):
arg_0 = tf.identity(arg_0_tensor)
arg_0 = tf.cast(arg_0, tf.int32)
arg_2 = tf.identity(arg_2_tensor)
arg_2 = tf.cast(arg_2, tf.float32)
ctc_ops._state_to_olabel(arg_0,arg_1,arg_2,)
except Exception as e:
print("Error:"+str(e))
except Exception as e:
print("Error:"+str(e))
### Relevant log output
```shell
2023-01-21 18:41:56.825881: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-21 18:41:57.928041: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/lib/:/home/nimashiri/anaconda3/envs/cuda11.2/lib/
2023-01-21 18:41:57.928214: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/lib/:/home/nimashiri/anaconda3/envs/cuda11.2/lib/
2023-01-21 18:41:57.928221: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-01-21 18:41:58.831767: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-21 18:41:58.856337: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-21 18:41:58.856463: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-21 18:41:58.857603: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-21 18:41:58.858581: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-21 18:41:58.858727: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-21 18:41:58.858843: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-21 18:41:59.490418: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-21 18:41:59.490746: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-21 18:41:59.490844: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-21 18:41:59.491116: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4278 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5
2023-01-21 18:41:59.524171: W tensorflow/tsl/framework/cpu_allocator_impl.cc:82] Allocation of 17198850112 exceeds 10% of free system memory.
Killed
</details>
Hi @nimashiri ,
-
The error message "Allocation of 17198850112 exceeds 10% of free system memory." indicates that the system has attempted to allocate more memory than is currently available.
-
The message "Killed" indicates that the process that caused this error message was terminated by the operating system. The operating system terminates the process, in order to free up the memory and prevent the system from crashing.
-
To resolve this issue, Optimize your code to reduce memory usage by decreasing the second attribute(num_labels) . Please find the gist here after reducing the num_labels attribute. Thank you!
This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.
@learning-to-play
This needs further investigation into whether the crash is due to OOM or another issue. Without debug info and looking at it it is hard to identify culprit.
@dmc1778, We tested the code on Ubuntu, the code is raising an error without crashing both in CPU and GPU. Please check the error below
CPU:
Error:{{function_node __wrapped__Exp_device_/job:localhost/replica:0/task:0/device:CPU:0}} OOM when allocating tensor with shape[3,2,4,537464066] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator mklcpu [Op:Exp]
GPU:
Error:{{function_node _wrapped__AddV2_device/job:localhost/replica:0/task:0/device:GPU:0}} failed to allocate memory [Op:AddV2]
Kindly find the attached logs below
(tf2.11) suryanarayanay@surya-ubuntu-22-04:~$ python 59383.py
2023-02-08 10:09:05.681349: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-08 10:09:05.803760: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-02-08 10:09:06.550513: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/suryanarayanay/miniconda3/envs/tf2.11/lib/
2023-02-08 10:09:06.550611: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/suryanarayanay/miniconda3/envs/tf2.11/lib/
2023-02-08 10:09:06.550634: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2.11.0
2023-02-08 10:09:11.446744: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-08 10:09:12.777747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 38235 MB memory: -> device: 0, name: NVIDIA A100-SXM4-40GB, pci bus id: 0000:00:04.0, compute capability: 8.0
2023-02-08 10:09:12.779304: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 38235 MB memory: -> device: 1, name: NVIDIA A100-SXM4-40GB, pci bus id: 0000:00:05.0, compute capability: 8.0
2023-02-08 10:09:34.824391: W tensorflow/tsl/framework/bfc_allocator.cc:479] Allocator (mklcpu) ran out of memory trying to allocate 48.05GiB (rounded to 51596550400)requested by op Exp
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation.
Current allocation summary follows.
Current allocation summary follows.
2023-02-08 10:09:34.824437: I tensorflow/tsl/framework/bfc_allocator.cc:1034] BFCAllocator dump for mklcpu
2023-02-08 10:09:34.824506: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (256): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824517: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (512): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824524: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (1024): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824530: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (2048): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824542: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (4096): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824551: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (8192): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824560: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (16384): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824566: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (32768): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824573: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (65536): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824585: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (131072): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824591: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (262144): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824599: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (524288): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824606: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (1048576): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824613: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (2097152): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824620: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (4194304): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824628: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (8388608): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824638: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (16777216): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824646: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (33554432): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824653: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (67108864): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824660: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (134217728): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824670: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (268435456): Total Chunks: 8, Chunks in use: 5. 160.00GiB allocated for chunks. 136.15GiB in use in bin. 136.15GiB client-requested in use in bin.
2023-02-08 10:09:34.824681: I tensorflow/tsl/framework/bfc_allocator.cc:1057] Bin for 48.05GiB was 256.00MiB, Chunk State:
2023-02-08 10:09:34.824694: I tensorflow/tsl/framework/bfc_allocator.cc:1063] Size: 3.93GiB | Requested Size: 0B | in_use: 0 | bin_num: 20, prev: Size: 48.05GiB | Requested Size: 48.05GiB | in_use: 1 | bin_num: -1
2023-02-08 10:09:34.824705: I tensorflow/tsl/framework/bfc_allocator.cc:1063] Size: 3.93GiB | Requested Size: 3.00GiB | in_use: 0 | bin_num: 20, prev: Size: 12.01GiB | Requested Size: 12.01GiB | in_use: 1 | bin_num: -1
2023-02-08 10:09:34.824719: I tensorflow/tsl/framework/bfc_allocator.cc:1063] Size: 15.98GiB | Requested Size: 12.01GiB | in_use: 0 | bin_num: 20, prev: Size: 16.02GiB | Requested Size: 16.02GiB | in_use: 1 | bin_num: -1
2023-02-08 10:09:34.824733: I tensorflow/tsl/framework/bfc_allocator.cc:1070] Next region of size 68719476736
2023-02-08 10:09:34.824743: I tensorflow/tsl/framework/bfc_allocator.cc:1090] InUse at 7f421fffb040 of size [12899137792](tel:(289)%20913-7792) next 8
2023-02-08 10:09:34.824751: I tensorflow/tsl/framework/bfc_allocator.cc:1090] InUse at 7f4520d8e940 of size 51596550400 next 5
2023-02-08 10:09:34.824759: I tensorflow/tsl/framework/bfc_allocator.cc:1090] Free at 7f51243dca40 of size 4223788544 next 18446744073709551615
2023-02-08 10:09:34.824767: I tensorflow/tsl/framework/bfc_allocator.cc:1070] Next region of size 68719476736
2023-02-08 10:09:34.824775: I tensorflow/tsl/framework/bfc_allocator.cc:1090] InUse at 7f529fffd040 of size 51596550400 next 3
2023-02-08 10:09:34.824782: I tensorflow/tsl/framework/bfc_allocator.cc:1090] InUse at 7f5ea364b140 of size 12899137792 next 4
2023-02-08 10:09:34.824790: I tensorflow/tsl/framework/bfc_allocator.cc:1090] Free at 7f61a43dea40 of size 4223788544 next 18446744073709551615
2023-02-08 10:09:34.824798: I tensorflow/tsl/framework/bfc_allocator.cc:1070] Next region of size 34359738368
2023-02-08 10:09:34.824807: I tensorflow/tsl/framework/bfc_allocator.cc:1090] InUse at 7f62dffff040 of size [17198850304](tel:(719)%20885-0304) next 1
2023-02-08 10:09:34.824815: I tensorflow/tsl/framework/bfc_allocator.cc:1090] Free at 7f66e1219140 of size 17160888064 next 18446744073709551615
2023-02-08 10:09:34.824823: I tensorflow/tsl/framework/bfc_allocator.cc:1095] Summary of in-use Chunks by size:
2023-02-08 10:09:34.824832: I tensorflow/tsl/framework/bfc_allocator.cc:1098] 2 Chunks of size [12899137792](tel:(289)%20913-7792) totalling 24.03GiB
2023-02-08 10:09:34.824839: I tensorflow/tsl/framework/bfc_allocator.cc:1098] 1 Chunks of size [17198850304](tel:(719)%20885-0304) totalling 16.02GiB
2023-02-08 10:09:34.824847: I tensorflow/tsl/framework/bfc_allocator.cc:1098] 2 Chunks of size 51596550400 totalling 96.11GiB
2023-02-08 10:09:34.824856: I tensorflow/tsl/framework/bfc_allocator.cc:1102] Sum Total of in-use chunks: 136.15GiB
2023-02-08 10:09:34.824864: I tensorflow/tsl/framework/bfc_allocator.cc:1104] total_region_allocated_bytes_: 171798691840 memory_limit_: 179366748160 available bytes: 7568056320 curr_region_allocation_bytes_: 137438953472
2023-02-08 10:09:34.824877: I tensorflow/tsl/framework/bfc_allocator.cc:1110] Stats:
Limit: 179366748160
InUse: 146190226688
MaxInUse: 146190226688
NumAllocs: 7
MaxAllocSize: 51596550400
Reserved: 0
PeakReserved: 0
LargestFreeBlock: 0
2023-02-08 10:09:34.824896: W tensorflow/tsl/framework/bfc_allocator.cc:492] **************************************__**************************************__***********_________
2023-02-08 10:09:34.824981: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at cwise_ops_common.h:320 : RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[3,2,4,537464066] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator mklcpu
Error:{{function_node __wrapped__Exp_device_/job:localhost/replica:0/task:0/device:CPU:0}} OOM when allocating tensor with shape[3,2,4,537464066] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator mklcpu [Op:Exp]
2023-02-08 10:09:45.673880: W tensorflow/tsl/framework/bfc_allocator.cc:479] Allocator (GPU_0_bfc) ran out of memory trying to allocate 48.05GiB (rounded to 51596550400)requested by op AddV2
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation.
Current allocation summary follows.
Current allocation summary follows.
2023-02-08 10:09:45.673927: I tensorflow/tsl/framework/bfc_allocator.cc:1034] BFCAllocator dump for GPU_0_bfc
2023-02-08 10:09:45.673939: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (256): Total Chunks: 4, Chunks in use: 4. 1.0KiB allocated for chunks. 1.0KiB in use in bin. 460B client-requested in use in bin.
2023-02-08 10:09:45.673954: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (512): Total Chunks: 1, Chunks in use: 0. 512B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.673966: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (1024): Total Chunks: 1, Chunks in use: 1. 1.2KiB allocated for chunks. 1.2KiB in use in bin. 1.0KiB client-requested in use in bin.
2023-02-08 10:09:45.673973: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (2048): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.673982: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (4096): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.673991: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (8192): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.673997: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (16384): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.674003: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (32768): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.674010: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (65536): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.674020: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (131072): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.674029: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (262144): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.674036: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (524288): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.674048: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (1048576): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.674054: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (2097152): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.674063: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (4194304): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.674070: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (8388608): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.674077: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (16777216): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.674088: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (33554432): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.674100: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (67108864): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.674109: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (134217728): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.674120: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (268435456): Total Chunks: 2, Chunks in use: 1. 37.34GiB allocated for chunks. 16.02GiB in use in bin. 16.02GiB client-requested in use in bin.
2023-02-08 10:09:45.674128: I tensorflow/tsl/framework/bfc_allocator.cc:1057] Bin for 48.05GiB was 256.00MiB, Chunk State:
2023-02-08 10:09:45.674141: I tensorflow/tsl/framework/bfc_allocator.cc:1063] Size: 21.32GiB | Requested Size: 0B | in_use: 0 | bin_num: 20, prev: Size: 16.02GiB | Requested Size: 16.02GiB | in_use: 1 | bin_num: -1
2023-02-08 10:09:45.674158: I tensorflow/tsl/framework/bfc_allocator.cc:1070] Next region of size 40092303360
2023-02-08 10:09:45.674169: I tensorflow/tsl/framework/bfc_allocator.cc:1090] InUse at 7f3848000000 of size 1280 next 1
2023-02-08 10:09:45.674176: I tensorflow/tsl/framework/bfc_allocator.cc:1090] InUse at 7f3848000500 of size 256 next 2
2023-02-08 10:09:45.674185: I tensorflow/tsl/framework/bfc_allocator.cc:1090] InUse at 7f3848000600 of size 256 next 3
2023-02-08 10:09:45.674193: I tensorflow/tsl/framework/bfc_allocator.cc:1090] InUse at 7f3848000700 of size 256 next 4
2023-02-08 10:09:45.674200: I tensorflow/tsl/framework/bfc_allocator.cc:1090] InUse at 7f3848000800 of size 256 next 5
2023-02-08 10:09:45.674208: I tensorflow/tsl/framework/bfc_allocator.cc:1090] Free at 7f3848000900 of size 512 next 7
2023-02-08 10:09:45.674216: I tensorflow/tsl/framework/bfc_allocator.cc:1090] InUse at 7f3848000b00 of size [17198850304](tel:(719)%20885-0304) next 8
2023-02-08 10:09:45.674224: I tensorflow/tsl/framework/bfc_allocator.cc:1090] Free at 7f3c4921ac00 of size 22893450240 next 18446744073709551615
2023-02-08 10:09:45.674232: I tensorflow/tsl/framework/bfc_allocator.cc:1095] Summary of in-use Chunks by size:
2023-02-08 10:09:45.674244: I tensorflow/tsl/framework/bfc_allocator.cc:1098] 4 Chunks of size 256 totalling 1.0KiB
2023-02-08 10:09:45.674255: I tensorflow/tsl/framework/bfc_allocator.cc:1098] 1 Chunks of size 1280 totalling 1.2KiB
2023-02-08 10:09:45.674265: I tensorflow/tsl/framework/bfc_allocator.cc:1098] 1 Chunks of size [17198850304](tel:(719)%20885-0304) totalling 16.02GiB
2023-02-08 10:09:45.674273: I tensorflow/tsl/framework/bfc_allocator.cc:1102] Sum Total of in-use chunks: 16.02GiB
2023-02-08 10:09:45.674281: I tensorflow/tsl/framework/bfc_allocator.cc:1104] total_region_allocated_bytes_: 40092303360 memory_limit_: 40092303360 available bytes: 0 curr_region_allocation_bytes_: 80184606720
2023-02-08 10:09:45.674293: I tensorflow/tsl/framework/bfc_allocator.cc:1110] Stats:
Limit: 40092303360
InUse: [17198852608](tel:(719)%20885-2608)
MaxInUse: [17198853120](tel:(719)%20885-3120)
NumAllocs: 8
MaxAllocSize: [17198850304](tel:(719)%20885-0304)
Reserved: 0
PeakReserved: 0
LargestFreeBlock: 0
2023-02-08 10:09:45.674309: W tensorflow/tsl/framework/bfc_allocator.cc:492] *******************************************_________________________________________________________
2023-02-08 10:09:45.677575: W tensorflow/core/framework/op_kernel.cc:1818] RESOURCE_EXHAUSTED: failed to allocate memory
Error:{{function_node __wrapped__AddV2_device_/job:localhost/replica:0/task:0/device:GPU:0}} failed to allocate memory [Op:AddV2]
code execution completed
Hi, This issue is working as intended and the OOM issue is happening due to the system hardware configuration.
This issue was caused by ctc_ops._state_to_olabel being called with a very large number (125091515651), which ended up needing 48GB of memory. The OOM error was expected when there was not that much memory available. And there is no security issue here.
This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.
Closing due to lack of recent activity. Please update the issue when new information becomes available, and we will reopen the issue. Thanks!