QANet icon indicating copy to clipboard operation
QANet copied to clipboard

RuntimeError('cannot join current thread',) in <object repr() failed>

Open SeekPoint opened this issue 5 years ago • 1 comments

(.venv) ub16c9@ub16c9-gpu:~/ub16_prj/QANet$ python config.py --mode train Building model... WARNING:tensorflow:From /home/ub16c9/ub16_prj/QANet/layers.py:52: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version. Instructions for updating: keep_dims is deprecated, use keepdims instead WARNING:tensorflow:From /home/ub16c9/ub16_prj/QANet/model.py:134: calling softmax (from tensorflow.python.ops.nn_ops) with dim is deprecated and will be removed in a future version. Instructions for updating: dim is deprecated, use axis instead WARNING:tensorflow:From /home/ub16c9/ub16_prj/QANet/model.py:174: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version. Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.

Total number of trainable parameters: 788673 2018-12-29 11:14:48.345129: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2018-12-29 11:14:48.431530: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2018-12-29 11:14:48.431955: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6575 pciBusID: 0000:01:00.0 totalMemory: 10.92GiB freeMemory: 10.43GiB 2018-12-29 11:14:48.431971: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0 2018-12-29 11:14:48.733045: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-12-29 11:14:48.733079: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 2018-12-29 11:14:48.733085: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N 2018-12-29 11:14:48.733318: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10086 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1) 2018-12-29 11:14:50.042331: W tensorflow/core/framework/allocator.cc:122] Allocation of 109906800 exceeds 10% of system memory. 2018-12-29 11:14:50.174758: W tensorflow/core/framework/allocator.cc:122] Allocation of 109906800 exceeds 10% of system memory. 2018-12-29 11:14:50.507489: W tensorflow/core/framework/allocator.cc:122] Allocation of 109906800 exceeds 10% of system memory. 2018-12-29 11:14:50.691090: W tensorflow/core/framework/allocator.cc:122] Allocation of 109906800 exceeds 10% of system memory. 2018-12-29 11:14:50.825623: W tensorflow/core/framework/allocator.cc:122] Allocation of 109906800 exceeds 10% of system memory. 55%|██████████████████████████████████████████████████████████████████████████████████████▏ | 32935/60000 [3:15:35<2:19:53, 3.22it/s] 90%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 53999/60000 [5:17:29<29:48, 3.36it/sException RuntimeError: RuntimeError('cannot join current thread',) in <object repr() failed> ignored██████████████████████████████████████████████████████████████████████| 328/328 [00:36<00:00, 9.07it/s] (.venv) ub16c9@ub16c9-gpu:~/ub16_prj/QANet$

SeekPoint avatar Dec 29 '18 08:12 SeekPoint

我也遇到了一样的问题,减小了batch size之后就好了或者是重新运行一次。很偶然

natureLanguageQing avatar Apr 24 '19 07:04 natureLanguageQing