关于训练过程中出现的一个莫名错误
运行信息: ubuntu 16 GTX 2080 Python2.7 cudatoolkit 9.0 cudnn 7.1.2
错误如下,请问问题可能出在哪里
2019-05-11 00:20:11.500263: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-05-11 00:20:11.873102: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:88:00.0
totalMemory: 10.73GiB freeMemory: 10.57GiB
2019-05-11 00:20:11.873190: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2019-05-11 00:20:12.375421: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-11 00:20:12.375482: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2019-05-11 00:20:12.375490: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2019-05-11 00:20:12.376248: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10213 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:88:00.0, compute capability: 7.5)
Forward pass: d_min = 425.000000, d_max = 931.150000.
2019-05-11 00:21:27.889785: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0x562e4471b430
Forward pass: d_min = 425.000000, d_max = 931.150000.
2019-05-11 00:21:28.671645: E tensorflow/stream_executor/cuda/cuda_blas.cc:654] failed to run cuBLAS routine cublasSgemv_v2: CUBLAS_STATUS_EXECUTION_FAILED
2019-05-11 00:21:29.142631: I tensorflow/stream_executor/stream.cc:4737] stream 0x562e44930550 did not memcpy host-to-device; source: 0x7fb5ae822b00
2019-05-11 00:21:29.142743: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at matrix_inverse_op.cc:191 : Internal: MatInvBatched: failed to copy pointers to device
Traceback (most recent call last):
File "train.py", line 352, in
Caused by op u'Model_tower0/get_homographies/MatMul_1', defined at:
File "train.py", line 352, in
InternalError (see above for traceback): Blas xGEMV launch failed : a.shape=[1,3,3], b.shape=[1,3,1], m=3, n=1, k=3 [[Node: Model_tower0/get_homographies/MatMul_1 = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Model_tower0/get_homographies/transpose_1, Model_tower0/get_homographies/Squeeze_5)]] [[Node: Model_tower0/gradients/AddN_515/_2989 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_94560_Model_tower0/gradients/AddN_515", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
你是用dtu数据集进行训练的吗,有没有对图像大小进行过改变呢
请问能不能在我的笔记本上跑这个代码,我的是ubuntu1804,gtx1060(3G),i7-6700HQ ,16G内存,64位
@x1597275 应该可以,显存大于11G,应该就没问题,我们1080的显卡(显存11g)是可以跑的。
@x1597275 他是1060只有3g顯存所以是不行的