transformer-xl
transformer-xl copied to clipboard
CUBLAS_STATUS_EXECUTION_FAILED and Blas GEMM launch failed
I have followed the required tensorflow 1.12 and python 2.7, but the following errors still raised. I wonder if you could help me. By the way, it is suggested by the internet that the CUBLAS_STATUS_EXECUTION_FAILED raises when tensorflow version does not match the cuda version. Could you please tell me the gpu type and cuda version you used to train? Looking forward to your reply.
`2021-09-08 23:23:08.258752: E tensorflow/stream_executor/cuda/cuda_blas.cc:652] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
File "train_gpu.py", line 475, in
Caused by op u'transformer/layer_0/rel_attn/qkv/Tensordot/MatMul', defined at:
File "train_gpu.py", line 475, in
return self.__call__(inputs, *args, **kwargs)
File "/home/caoyq/anaconda3/envs/tensorflow_cp27/lib/python2.7/site-packages/tensorflow/python/layers/base.py", line 374, in call outputs = super(Layer, self).call(inputs, *args, **kwargs) File "/home/caoyq/anaconda3/envs/tensorflow_cp27/lib/python2.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 757, in call outputs = self.call(inputs, *args, **kwargs) File "/home/caoyq/anaconda3/envs/tensorflow_cp27/lib/python2.7/site-packages/tensorflow/python/keras/layers/core.py", line 963, in call outputs = standard_ops.tensordot(inputs, self.kernel, [[rank - 1], [0]]) File "/home/caoyq/anaconda3/envs/tensorflow_cp27/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 2985, in tensordot ab_matmul = matmul(a_reshape, b_reshape) File "/home/caoyq/anaconda3/envs/tensorflow_cp27/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 2057, in matmul a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name) File "/home/caoyq/anaconda3/envs/tensorflow_cp27/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 4560, in mat_mul name=name) File "/home/caoyq/anaconda3/envs/tensorflow_cp27/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/caoyq/anaconda3/envs/tensorflow_cp27/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func return func(*args, **kwargs) File "/home/caoyq/anaconda3/envs/tensorflow_cp27/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op op_def=op_def) File "/home/caoyq/anaconda3/envs/tensorflow_cp27/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1770, in init self._traceback = tf_stack.extract_stack()
InternalError (see above for traceback): Blas GEMM launch failed : a.shape=(31424, 1024), b.shape=(1024, 3072), m=31424, n=3072, k=1024 [[node transformer/layer_0/rel_attn/qkv/Tensordot/MatMul (defined at /home/caoyq/transformer-xl-master/tf/model.py:54) = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](transformer/layer_0/rel_attn/qkv/Tensordot/Reshape, transformer/layer_0/rel_attn/qkv/kernel/read)]] `