AnimeGANv2
AnimeGANv2 copied to clipboard
在nvidia A4000显卡上无法训练
使用命令python train.py --dataset Hayao --epoch 101 --init_epoch 10 训练过程中提示错误 failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED。 这个要怎么解决?
软件版本:
packages in environment at C:\ProgramData\Anaconda3\envs\py36:
Name Version Build Channel
absl-py 1.0.0 pypi_0 pypi astor 0.8.1 pypi_0 pypi cached-property 1.5.2 pypi_0 pypi certifi 2021.5.30 py36haa95532_0 colorama 0.4.4 pypi_0 pypi cudatoolkit 10.0.130 0 cudnn 7.6.0 cuda10.0_0 dataclasses 0.8 pypi_0 pypi gast 0.2.2 pypi_0 pypi google-pasta 0.2.0 pypi_0 pypi grpcio 1.44.0 pypi_0 pypi h5py 3.1.0 pypi_0 pypi importlib-metadata 4.8.3 pypi_0 pypi keras-applications 1.0.8 pypi_0 pypi keras-preprocessing 1.1.2 pypi_0 pypi markdown 3.3.6 pypi_0 pypi numpy 1.19.5 pypi_0 pypi opencv-python 4.5.5.62 pypi_0 pypi opt-einsum 3.3.0 pypi_0 pypi pip 21.2.2 py36haa95532_0 protobuf 3.19.4 pypi_0 pypi python 3.6.2 h09676a0_15 setuptools 58.0.4 py36haa95532_0 six 1.16.0 pypi_0 pypi tensorboard 1.15.0 pypi_0 pypi tensorflow-estimator 1.15.1 pypi_0 pypi tensorflow-gpu 1.15.0 pypi_0 pypi termcolor 1.1.0 pypi_0 pypi tqdm 4.62.3 pypi_0 pypi typing-extensions 4.1.1 pypi_0 pypi vc 14.2 h21ff451_1 vs2015_runtime 14.27.29016 h5e58377_2 werkzeug 2.0.3 pypi_0 pypi wheel 0.37.1 pyhd3eb1b0_0 wincertstore 0.2 py36h7fe50ca_0 wrapt 1.13.3 pypi_0 pypi zipp 3.6.0 pypi_0 pypi
错误日志: 2022-02-23 11:58:11.480753: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED 2022-02-23 11:58:11.483857: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED 2022-02-23 11:58:11.483905: I tensorflow/stream_executor/stream.cc:4976] [stream=000001E1AF327560,impl=000001E1A391B2F0] did not memset GPU location; source: 000000B3A7DCBCE8; size: 8388608; pattern: ffffffff 2022-02-23 11:58:11.486156: I tensorflow/stream_executor/stream.cc:4976] [stream=000001E1AF327560,impl=000001E1A391B2F0] did not memset GPU location; source: 000000B3A760C038; size: 8388608; pattern: ffffffff 2022-02-23 11:58:11.490415: I tensorflow/stream_executor/stream.cc:4976] [stream=000001E1AF327560,impl=000001E1A391B2F0] did not memset GPU location; source: 000000B3A760C058; size: 8388608; pattern: ffffffff 2022-02-23 11:58:11.488390: I tensorflow/stream_executor/stream.cc:4976] [stream=000001E1AF327560,impl=000001E1A391B2F0] did not memset GPU location; source: 000000B3A7DCBD08; size: 8388608; pattern: ffffffff 2022-02-23 11:58:11.493417: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at conv_ops.cc:1006 : Not found: No algorithm worked! 2022-02-23 11:58:11.496159: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at conv_ops.cc:1006 : Not found: No algorithm worked! 2022-02-23 11:58:11.497784: I tensorflow/stream_executor/stream.cc:4976] [stream=000001E1AF327560,impl=000001E1A391B2F0] did not memset GPU location; source: 000000B3A760C038; size: 8388608; pattern: ffffffff 2022-02-23 11:58:11.503459: I tensorflow/stream_executor/stream.cc:4976] [stream=000001E1AF327560,impl=000001E1A391B2F0] did not memset GPU location; source: 000000B3A760C058; size: 8388608; pattern: ffffffff 2022-02-23 11:58:11.505625: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at conv_ops.cc:1006 : Not found: No algorithm worked! 2022-02-23 11:58:11.844846: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at conv_ops.cc:1006 : Not found: No algorithm worked! Traceback (most recent call last): File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\client\session.py", line 1365, in _do_call return fn(*args) File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\client\session.py", line 1350, in _run_fn target_list, run_metadata) File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\client\session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found. (0) Internal: Blas GEMM launch failed : a.shape=(786432, 3), b.shape=(3, 3), m=786432, n=3, k=3 [[{{node Tensordot/MatMul}}]] [[mul_10/_893]] (1) Internal: Blas GEMM launch failed : a.shape=(786432, 3), b.shape=(3, 3), m=786432, n=3, k=3 [[{{node Tensordot/MatMul}}]] 0 successful operations. 0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 101, in
Original stack trace for 'Tensordot/MatMul':
File "train.py", line 101, in