returnn icon indicating copy to clipboard operation
returnn copied to clipboard

bfc_allocator: Check failed: BinForSize(bin_size) == BinFromIndex(b)

Open albertz opened this issue 4 years ago • 4 comments

 % python3 tests/test_TFEngine.py test_engine_train
Installed libSegFault.so.
TF version: 1.14.0
2020-06-09 14:03:19.232834: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
2020-06-09 14:03:19.255880: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3192945000 Hz
2020-06-09 14:03:19.256060: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3287610 executing computations on platform Host. Devices:
2020-06-09 14:03:19.256076: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2020-06-09 14:03:19.271257: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-06-09 14:03:19.365655: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-09 14:03:19.367278: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x33553b0 executing computations on platform CUDA. Devices:
2020-06-09 14:03:19.367303: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce GTX 980, Compute Capability 5.2
2020-06-09 14:03:19.367441: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-09 14:03:19.368958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properti[1199/1526]
name: GeForce GTX 980 major: 5 minor: 2 memoryClockRate(GHz): 1.266
pciBusID: 0000:01:00.0
2020-06-09 14:03:19.395282: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.1
2020-06-09 14:03:19.587270: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10
2020-06-09 14:03:19.740996: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10
2020-06-09 14:03:20.319288: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10
2020-06-09 14:03:20.640825: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10
2020-06-09 14:03:20.872985: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10
2020-06-09 14:03:21.500073: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-06-09 14:03:21.500219: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-09 14:03:21.501807: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-09 14:03:21.503303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2020-06-09 14:03:21.503347: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.1
2020-06-09 14:03:21.505274: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-09 14:03:21.505297: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2020-06-09 14:03:21.505306: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2020-06-09 14:03:21.505661: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-09 14:03:21.507224: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-09 14:03:21.508735: F tensorflow/core/common_runtime/bfc_allocator.cc:61] Check failed: BinForSize(bin_size) == BinFromIndex(b) (0x37b30e8 vs. 0x37b2c88)
Fatal Python error: Aborted

Current thread 0x00007fbb7620e700 (most recent call first):
  File "/work/tools/asr/python/3.7.1_tf_1.14-generic+cuda10.1/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 693 in __init__
  File "/work/tools/asr/python/3.7.1_tf_1.14-generic+cuda10.1/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1732 in __init__
  File "tests/test_TFEngine.py", line 100 in <module>
*** Aborted

The line 100 in test_TFEngine.py is this: session = tf.InteractiveSession()

I'm getting this, and Google does not return any results on Check failed: BinForSize(bin_size) == BinFromIndex(b), so that's why I'm posting this here (in case anyone else ever sees this).

I'm probably doing sth wrong...

albertz avatar Jun 09 '20 12:06 albertz

@curufinwe maybe? This is with your Python 3.7 env with TF 1.14. On my sulfid desktop. It does not seem to occur in the cluster.

albertz avatar Jun 09 '20 13:06 albertz

I know this is an old issue, but I got this error now as well and have some idea: This error happens on some old intel CPUs when Tensorflow is compiled with -march=barcelona, which we do to run stuff on old AMD Opteron CPUs. I got this error on Core2 Duo machines, and 2nd generation i3/i5 machines, but not on 4th generation i5 machines and newer.

JackTemaki avatar Jan 05 '22 11:01 JackTemaki

Or maybe again some issue with GCC 5?

albertz avatar Jan 05 '22 11:01 albertz

Yes, could be.

JackTemaki avatar Jan 05 '22 11:01 JackTemaki