returnn
returnn copied to clipboard
bfc_allocator: Check failed: BinForSize(bin_size) == BinFromIndex(b)
% python3 tests/test_TFEngine.py test_engine_train
Installed libSegFault.so.
TF version: 1.14.0
2020-06-09 14:03:19.232834: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
2020-06-09 14:03:19.255880: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3192945000 Hz
2020-06-09 14:03:19.256060: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3287610 executing computations on platform Host. Devices:
2020-06-09 14:03:19.256076: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
2020-06-09 14:03:19.271257: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-06-09 14:03:19.365655: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-09 14:03:19.367278: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x33553b0 executing computations on platform CUDA. Devices:
2020-06-09 14:03:19.367303: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce GTX 980, Compute Capability 5.2
2020-06-09 14:03:19.367441: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-09 14:03:19.368958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properti[1199/1526]
name: GeForce GTX 980 major: 5 minor: 2 memoryClockRate(GHz): 1.266
pciBusID: 0000:01:00.0
2020-06-09 14:03:19.395282: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.1
2020-06-09 14:03:19.587270: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10
2020-06-09 14:03:19.740996: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10
2020-06-09 14:03:20.319288: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10
2020-06-09 14:03:20.640825: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10
2020-06-09 14:03:20.872985: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10
2020-06-09 14:03:21.500073: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-06-09 14:03:21.500219: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-09 14:03:21.501807: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-09 14:03:21.503303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2020-06-09 14:03:21.503347: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.1
2020-06-09 14:03:21.505274: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-09 14:03:21.505297: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2020-06-09 14:03:21.505306: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2020-06-09 14:03:21.505661: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-09 14:03:21.507224: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-09 14:03:21.508735: F tensorflow/core/common_runtime/bfc_allocator.cc:61] Check failed: BinForSize(bin_size) == BinFromIndex(b) (0x37b30e8 vs. 0x37b2c88)
Fatal Python error: Aborted
Current thread 0x00007fbb7620e700 (most recent call first):
File "/work/tools/asr/python/3.7.1_tf_1.14-generic+cuda10.1/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 693 in __init__
File "/work/tools/asr/python/3.7.1_tf_1.14-generic+cuda10.1/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1732 in __init__
File "tests/test_TFEngine.py", line 100 in <module>
*** Aborted
The line 100 in test_TFEngine.py
is this:
session = tf.InteractiveSession()
I'm getting this, and Google does not return any results on Check failed: BinForSize(bin_size) == BinFromIndex(b)
, so that's why I'm posting this here (in case anyone else ever sees this).
I'm probably doing sth wrong...
@curufinwe maybe? This is with your Python 3.7 env with TF 1.14. On my sulfid desktop. It does not seem to occur in the cluster.
I know this is an old issue, but I got this error now as well and have some idea: This error happens on some old intel CPUs when Tensorflow is compiled with -march=barcelona, which we do to run stuff on old AMD Opteron CPUs. I got this error on Core2 Duo machines, and 2nd generation i3/i5 machines, but not on 4th generation i5 machines and newer.
Or maybe again some issue with GCC 5?
Yes, could be.