tensorflow icon indicating copy to clipboard operation
tensorflow copied to clipboard

Process get killed when running tensorflow.python.ops.ctc_ops._state_to_olabel

Open dmc1778 opened this issue 2 years ago • 6 comments

Click to expand!

Issue Type

Bug

Have you reproduced the bug with TF nightly?

No

Source

binary

Tensorflow Version

2.11.0

Custom Code

Yes

OS Platform and Distribution

Ubuntu 22.04

Mobile device

No response

Python version

3.9

Bazel version

No response

GCC/Compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current Behaviour?

Due to large tensor.

Standalone code to reproduce the issue

import tensorflow as tf
import numpy as np
from tensorflow.python.ops import ctc_ops
try:
  try:
    with tf.device('/CPU'):
      arg_0_tensor = tf.random.uniform([2, 4], minval=-256, maxval=257, dtype=tf.int32)
      arg_0 = tf.identity(arg_0_tensor)
      arg_1 = 125091515651
      arg_2_tensor = tf.random.uniform([3, 2, 10], dtype=tf.float32)
      arg_2 = tf.identity(arg_2_tensor)
      out = ctc_ops._state_to_olabel(arg_0,arg_1,arg_2,)
  except Exception as e:
    print("Error:"+str(e))
  try:
    with tf.device('/GPU:0'):
      arg_0 = tf.identity(arg_0_tensor)
      arg_0 = tf.cast(arg_0, tf.int32)
      arg_2 = tf.identity(arg_2_tensor)
      arg_2 = tf.cast(arg_2, tf.float32)
      ctc_ops._state_to_olabel(arg_0,arg_1,arg_2,)
  except Exception as e:
    print("Error:"+str(e))
except Exception as e:
  print("Error:"+str(e))


### Relevant log output

```shell
2023-01-21 18:41:56.825881: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-21 18:41:57.928041: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/lib/:/home/nimashiri/anaconda3/envs/cuda11.2/lib/
2023-01-21 18:41:57.928214: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/lib/:/home/nimashiri/anaconda3/envs/cuda11.2/lib/
2023-01-21 18:41:57.928221: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2023-01-21 18:41:58.831767: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-21 18:41:58.856337: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-21 18:41:58.856463: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-21 18:41:58.857603: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-01-21 18:41:58.858581: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-21 18:41:58.858727: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-21 18:41:58.858843: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-21 18:41:59.490418: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-21 18:41:59.490746: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-21 18:41:59.490844: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-01-21 18:41:59.491116: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4278 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5
2023-01-21 18:41:59.524171: W tensorflow/tsl/framework/cpu_allocator_impl.cc:82] Allocation of 17198850112 exceeds 10% of free system memory.
Killed

</details>

dmc1778 avatar Jan 21 '23 23:01 dmc1778

Hi @nimashiri ,

  • The error message "Allocation of 17198850112 exceeds 10% of free system memory." indicates that the system has attempted to allocate more memory than is currently available.

  • The message "Killed" indicates that the process that caused this error message was terminated by the operating system. The operating system terminates the process, in order to free up the memory and prevent the system from crashing.

  • To resolve this issue, Optimize your code to reduce memory usage by decreasing the second attribute(num_labels) . Please find the gist here after reducing the num_labels attribute. Thank you!

synandi avatar Jan 24 '23 15:01 synandi

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] avatar Feb 07 '23 12:02 google-ml-butler[bot]

@learning-to-play

This needs further investigation into whether the crash is due to OOM or another issue. Without debug info and looking at it it is hard to identify culprit.

mihaimaruseac avatar Feb 07 '23 19:02 mihaimaruseac

@dmc1778, We tested the code on Ubuntu, the code is raising an error without crashing both in CPU and GPU. Please check the error below

CPU:

Error:{{function_node __wrapped__Exp_device_/job:localhost/replica:0/task:0/device:CPU:0}} OOM when allocating tensor with shape[3,2,4,537464066] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator mklcpu [Op:Exp]

GPU:

Error:{{function_node _wrapped__AddV2_device/job:localhost/replica:0/task:0/device:GPU:0}} failed to allocate memory [Op:AddV2]

Kindly find the attached logs below

(tf2.11) suryanarayanay@surya-ubuntu-22-04:~$ python 59383.py
2023-02-08 10:09:05.681349: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-08 10:09:05.803760: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-02-08 10:09:06.550513: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/suryanarayanay/miniconda3/envs/tf2.11/lib/
2023-02-08 10:09:06.550611: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/suryanarayanay/miniconda3/envs/tf2.11/lib/
2023-02-08 10:09:06.550634: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2.11.0
2023-02-08 10:09:11.446744: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-08 10:09:12.777747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 38235 MB memory:  -> device: 0, name: NVIDIA A100-SXM4-40GB, pci bus id: 0000:00:04.0, compute capability: 8.0
2023-02-08 10:09:12.779304: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 38235 MB memory:  -> device: 1, name: NVIDIA A100-SXM4-40GB, pci bus id: 0000:00:05.0, compute capability: 8.0
2023-02-08 10:09:34.824391: W tensorflow/tsl/framework/bfc_allocator.cc:479] Allocator (mklcpu) ran out of memory trying to allocate 48.05GiB (rounded to 51596550400)requested by op Exp
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation.
Current allocation summary follows.
Current allocation summary follows.
2023-02-08 10:09:34.824437: I tensorflow/tsl/framework/bfc_allocator.cc:1034] BFCAllocator dump for mklcpu
2023-02-08 10:09:34.824506: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (256):        Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824517: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (512):        Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824524: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (1024):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824530: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (2048):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824542: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (4096):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824551: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (8192):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824560: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (16384):      Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824566: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (32768):      Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824573: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (65536):      Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824585: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (131072):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824591: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (262144):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824599: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (524288):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824606: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (1048576):    Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824613: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (2097152):    Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824620: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (4194304):    Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824628: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (8388608):    Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824638: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (16777216):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824646: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (33554432):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824653: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (67108864):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824660: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (134217728):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:34.824670: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (268435456):  Total Chunks: 8, Chunks in use: 5. 160.00GiB allocated for chunks. 136.15GiB in use in bin. 136.15GiB client-requested in use in bin.
2023-02-08 10:09:34.824681: I tensorflow/tsl/framework/bfc_allocator.cc:1057] Bin for 48.05GiB was 256.00MiB, Chunk State:
2023-02-08 10:09:34.824694: I tensorflow/tsl/framework/bfc_allocator.cc:1063]   Size: 3.93GiB | Requested Size: 0B | in_use: 0 | bin_num: 20, prev:   Size: 48.05GiB | Requested Size: 48.05GiB | in_use: 1 | bin_num: -1
2023-02-08 10:09:34.824705: I tensorflow/tsl/framework/bfc_allocator.cc:1063]   Size: 3.93GiB | Requested Size: 3.00GiB | in_use: 0 | bin_num: 20, prev:   Size: 12.01GiB | Requested Size: 12.01GiB | in_use: 1 | bin_num: -1
2023-02-08 10:09:34.824719: I tensorflow/tsl/framework/bfc_allocator.cc:1063]   Size: 15.98GiB | Requested Size: 12.01GiB | in_use: 0 | bin_num: 20, prev:   Size: 16.02GiB | Requested Size: 16.02GiB | in_use: 1 | bin_num: -1
2023-02-08 10:09:34.824733: I tensorflow/tsl/framework/bfc_allocator.cc:1070] Next region of size 68719476736
2023-02-08 10:09:34.824743: I tensorflow/tsl/framework/bfc_allocator.cc:1090] InUse at 7f421fffb040 of size [12899137792](tel:(289)%20913-7792) next 8
2023-02-08 10:09:34.824751: I tensorflow/tsl/framework/bfc_allocator.cc:1090] InUse at 7f4520d8e940 of size 51596550400 next 5
2023-02-08 10:09:34.824759: I tensorflow/tsl/framework/bfc_allocator.cc:1090] Free  at 7f51243dca40 of size 4223788544 next 18446744073709551615
2023-02-08 10:09:34.824767: I tensorflow/tsl/framework/bfc_allocator.cc:1070] Next region of size 68719476736
2023-02-08 10:09:34.824775: I tensorflow/tsl/framework/bfc_allocator.cc:1090] InUse at 7f529fffd040 of size 51596550400 next 3
2023-02-08 10:09:34.824782: I tensorflow/tsl/framework/bfc_allocator.cc:1090] InUse at 7f5ea364b140 of size 12899137792 next 4
2023-02-08 10:09:34.824790: I tensorflow/tsl/framework/bfc_allocator.cc:1090] Free  at 7f61a43dea40 of size 4223788544 next 18446744073709551615
2023-02-08 10:09:34.824798: I tensorflow/tsl/framework/bfc_allocator.cc:1070] Next region of size 34359738368
2023-02-08 10:09:34.824807: I tensorflow/tsl/framework/bfc_allocator.cc:1090] InUse at 7f62dffff040 of size [17198850304](tel:(719)%20885-0304) next 1
2023-02-08 10:09:34.824815: I tensorflow/tsl/framework/bfc_allocator.cc:1090] Free  at 7f66e1219140 of size 17160888064 next 18446744073709551615
2023-02-08 10:09:34.824823: I tensorflow/tsl/framework/bfc_allocator.cc:1095]      Summary of in-use Chunks by size:
2023-02-08 10:09:34.824832: I tensorflow/tsl/framework/bfc_allocator.cc:1098] 2 Chunks of size [12899137792](tel:(289)%20913-7792) totalling 24.03GiB
2023-02-08 10:09:34.824839: I tensorflow/tsl/framework/bfc_allocator.cc:1098] 1 Chunks of size [17198850304](tel:(719)%20885-0304) totalling 16.02GiB
2023-02-08 10:09:34.824847: I tensorflow/tsl/framework/bfc_allocator.cc:1098] 2 Chunks of size 51596550400 totalling 96.11GiB
2023-02-08 10:09:34.824856: I tensorflow/tsl/framework/bfc_allocator.cc:1102] Sum Total of in-use chunks: 136.15GiB
2023-02-08 10:09:34.824864: I tensorflow/tsl/framework/bfc_allocator.cc:1104] total_region_allocated_bytes_: 171798691840 memory_limit_: 179366748160 available bytes: 7568056320 curr_region_allocation_bytes_: 137438953472
2023-02-08 10:09:34.824877: I tensorflow/tsl/framework/bfc_allocator.cc:1110] Stats:
Limit:                    179366748160
InUse:                    146190226688
MaxInUse:                 146190226688
NumAllocs:                           7
MaxAllocSize:              51596550400
Reserved:                            0
PeakReserved:                        0
LargestFreeBlock:                    0

2023-02-08 10:09:34.824896: W tensorflow/tsl/framework/bfc_allocator.cc:492] **************************************__**************************************__***********_________
2023-02-08 10:09:34.824981: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at cwise_ops_common.h:320 : RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[3,2,4,537464066] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator mklcpu
Error:{{function_node __wrapped__Exp_device_/job:localhost/replica:0/task:0/device:CPU:0}} OOM when allocating tensor with shape[3,2,4,537464066] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator mklcpu [Op:Exp]
2023-02-08 10:09:45.673880: W tensorflow/tsl/framework/bfc_allocator.cc:479] Allocator (GPU_0_bfc) ran out of memory trying to allocate 48.05GiB (rounded to 51596550400)requested by op AddV2
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation.
Current allocation summary follows.
Current allocation summary follows.
2023-02-08 10:09:45.673927: I tensorflow/tsl/framework/bfc_allocator.cc:1034] BFCAllocator dump for GPU_0_bfc
2023-02-08 10:09:45.673939: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (256):        Total Chunks: 4, Chunks in use: 4. 1.0KiB allocated for chunks. 1.0KiB in use in bin. 460B client-requested in use in bin.
2023-02-08 10:09:45.673954: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (512):        Total Chunks: 1, Chunks in use: 0. 512B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.673966: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (1024):       Total Chunks: 1, Chunks in use: 1. 1.2KiB allocated for chunks. 1.2KiB in use in bin. 1.0KiB client-requested in use in bin.
2023-02-08 10:09:45.673973: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (2048):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.673982: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (4096):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.673991: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (8192):       Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.673997: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (16384):      Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.674003: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (32768):      Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.674010: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (65536):      Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.674020: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (131072):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.674029: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (262144):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.674036: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (524288):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.674048: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (1048576):    Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.674054: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (2097152):    Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.674063: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (4194304):    Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.674070: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (8388608):    Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.674077: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (16777216):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.674088: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (33554432):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.674100: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (67108864):   Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.674109: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (134217728):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2023-02-08 10:09:45.674120: I tensorflow/tsl/framework/bfc_allocator.cc:1041] Bin (268435456):  Total Chunks: 2, Chunks in use: 1. 37.34GiB allocated for chunks. 16.02GiB in use in bin. 16.02GiB client-requested in use in bin.
2023-02-08 10:09:45.674128: I tensorflow/tsl/framework/bfc_allocator.cc:1057] Bin for 48.05GiB was 256.00MiB, Chunk State:
2023-02-08 10:09:45.674141: I tensorflow/tsl/framework/bfc_allocator.cc:1063]   Size: 21.32GiB | Requested Size: 0B | in_use: 0 | bin_num: 20, prev:   Size: 16.02GiB | Requested Size: 16.02GiB | in_use: 1 | bin_num: -1
2023-02-08 10:09:45.674158: I tensorflow/tsl/framework/bfc_allocator.cc:1070] Next region of size 40092303360
2023-02-08 10:09:45.674169: I tensorflow/tsl/framework/bfc_allocator.cc:1090] InUse at 7f3848000000 of size 1280 next 1
2023-02-08 10:09:45.674176: I tensorflow/tsl/framework/bfc_allocator.cc:1090] InUse at 7f3848000500 of size 256 next 2
2023-02-08 10:09:45.674185: I tensorflow/tsl/framework/bfc_allocator.cc:1090] InUse at 7f3848000600 of size 256 next 3
2023-02-08 10:09:45.674193: I tensorflow/tsl/framework/bfc_allocator.cc:1090] InUse at 7f3848000700 of size 256 next 4
2023-02-08 10:09:45.674200: I tensorflow/tsl/framework/bfc_allocator.cc:1090] InUse at 7f3848000800 of size 256 next 5
2023-02-08 10:09:45.674208: I tensorflow/tsl/framework/bfc_allocator.cc:1090] Free  at 7f3848000900 of size 512 next 7
2023-02-08 10:09:45.674216: I tensorflow/tsl/framework/bfc_allocator.cc:1090] InUse at 7f3848000b00 of size [17198850304](tel:(719)%20885-0304) next 8
2023-02-08 10:09:45.674224: I tensorflow/tsl/framework/bfc_allocator.cc:1090] Free  at 7f3c4921ac00 of size 22893450240 next 18446744073709551615
2023-02-08 10:09:45.674232: I tensorflow/tsl/framework/bfc_allocator.cc:1095]      Summary of in-use Chunks by size:
2023-02-08 10:09:45.674244: I tensorflow/tsl/framework/bfc_allocator.cc:1098] 4 Chunks of size 256 totalling 1.0KiB
2023-02-08 10:09:45.674255: I tensorflow/tsl/framework/bfc_allocator.cc:1098] 1 Chunks of size 1280 totalling 1.2KiB
2023-02-08 10:09:45.674265: I tensorflow/tsl/framework/bfc_allocator.cc:1098] 1 Chunks of size [17198850304](tel:(719)%20885-0304) totalling 16.02GiB
2023-02-08 10:09:45.674273: I tensorflow/tsl/framework/bfc_allocator.cc:1102] Sum Total of in-use chunks: 16.02GiB
2023-02-08 10:09:45.674281: I tensorflow/tsl/framework/bfc_allocator.cc:1104] total_region_allocated_bytes_: 40092303360 memory_limit_: 40092303360 available bytes: 0 curr_region_allocation_bytes_: 80184606720
2023-02-08 10:09:45.674293: I tensorflow/tsl/framework/bfc_allocator.cc:1110] Stats:
Limit:                     40092303360
InUse:                     [17198852608](tel:(719)%20885-2608)
MaxInUse:                  [17198853120](tel:(719)%20885-3120)
NumAllocs:                           8
MaxAllocSize:              [17198850304](tel:(719)%20885-0304)
Reserved:                            0
PeakReserved:                        0
LargestFreeBlock:                    0

2023-02-08 10:09:45.674309: W tensorflow/tsl/framework/bfc_allocator.cc:492] *******************************************_________________________________________________________
2023-02-08 10:09:45.677575: W tensorflow/core/framework/op_kernel.cc:1818] RESOURCE_EXHAUSTED: failed to allocate memory
Error:{{function_node __wrapped__AddV2_device_/job:localhost/replica:0/task:0/device:GPU:0}} failed to allocate memory [Op:AddV2]
code execution completed

synandi avatar Feb 09 '23 02:02 synandi

Hi, This issue is working as intended and the OOM issue is happening due to the system hardware configuration. This issue was caused by ctc_ops._state_to_olabel being called with a very large number (125091515651), which ended up needing 48GB of memory. The OOM error was expected when there was not that much memory available. And there is no security issue here.

sachinprasadhs avatar Feb 16 '23 20:02 sachinprasadhs

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] avatar Feb 23 '23 21:02 google-ml-butler[bot]

Closing due to lack of recent activity. Please update the issue when new information becomes available, and we will reopen the issue. Thanks!

sachinprasadhs avatar Mar 25 '23 02:03 sachinprasadhs

Are you satisfied with the resolution of your issue? Yes No

google-ml-butler[bot] avatar Mar 25 '23 02:03 google-ml-butler[bot]