DeepV2D
DeepV2D copied to clipboard
Can't run demo with batch size 8
Hi I was trying to run the demo
python demos/demo_v2d.py --model=models/scannet.ckpt --sequence=data/demos/scannet_0
But got the following error
2020-02-27 14:07:27.062479: E tensorflow/stream_executor/cuda/cuda_blas.cc:652] failed to run cuBLAS routine cublasGemmBatchedEx: CUBLAS_STATUS_NOT_SUPPORTED
2020-02-27 14:07:27.062517: E tensorflow/stream_executor/cuda/cuda_blas.cc:2574] Internal: failed BLAS call, see log for details
Traceback (most recent call last):
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMMBatched launch failed : a.shape=[134400,2,3], b.shape=[134400,3,6], m=2, n=6, k=3, batch_size=134400
[[{{node motion/PnP/einsum_1/MatMul}} = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](motion/PnP/einsum_1/Reshape, motion/PnP/einsum_1/Reshape
_1)]]
[[{{node motion/PnP_2/einsum_7/Reshape_2/_2363}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0",
send_device_incarnation=1, tensor_name="edge_5308_motion/PnP_2/einsum_7/Reshape_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "demos/demo_v2d.py", line 82, in <module>
main(args)
File "demos/demo_v2d.py", line 64, in main
depths, poses = deepv2d(images, intrinsics, viz=True, iters=args.n_iters)
File "/projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/deepv2d.py", line 462, in __call__
self.update_poses(i)
File "/projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/deepv2d.py", line 368, in update_poses
self.poses, self.intrinsics, self.weights = self.sess.run(outputs, feed_dict=feed_dict)
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas xGEMMBatched launch failed : a.shape=[134400,2,3], b.shape=[134400,3,6], m=2, n=6, k=3, batch_size=134400
[[node motion/PnP/einsum_1/MatMul (defined at /projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/utils/einsum.py:49) = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job
:localhost/replica:0/task:0/device:GPU:0"](motion/PnP/einsum_1/Reshape, motion/PnP/einsum_1/Reshape_1)]]
[[{{node motion/PnP_2/einsum_7/Reshape_2/_2363}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0",
send_device_incarnation=1, tensor_name="edge_5308_motion/PnP_2/einsum_7/Reshape_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Caused by op 'motion/PnP/einsum_1/MatMul', defined at:
File "demos/demo_v2d.py", line 82, in <module>
main(args)
File "demos/demo_v2d.py", line 55, in main
deepv2d = DeepV2D(cfg, args.model, use_fcrn=args.fcrn, is_calibrated=is_calibrated, mode=args.mode)
File "/projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/deepv2d.py", line 68, in __init__
self._build_motion_graph()
File "/projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/deepv2d.py", line 129, in _build_motion_graph
images, depths, intrinsics, edge_inds, init=do_init)
File "/projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/modules/motion.py", line 282, in forward
Tij = Tij.keyframe_optim(target, weight, depths, intrinsics)
File "/projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/geometry/transformation.py", line 364, in keyframe_optim
J = einsum('...ij,...jk->...ik', jproj, jtran)
File "/projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/utils/einsum.py", line 49, in einsum
out = tf.einsum(equation, *inputs)
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/ops/special_math_ops.py", line 257, in einsum
axes_to_sum)
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/ops/special_math_ops.py", line 389, in _einsum_reduction
product = math_ops.matmul(t0, t1)
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 2019, in matmul
a, b, adj_x=adjoint_a, adj_y=adjoint_b, name=name)
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1245, in batch_mat_mul
"BatchMatMul", x=x, y=y, adj_x=adj_x, adj_y=adj_y, name=name)
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/homes/grail/xuanluo/anaconda3/envs/deepv2d/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
self._traceback = tf_stack.extract_stack()
InternalError (see above for traceback): Blas xGEMMBatched launch failed : a.shape=[134400,2,3], b.shape=[134400,3,6], m=2, n=6, k=3, batch_size=134400
[[node motion/PnP/einsum_1/MatMul (defined at /projects/grail/xuanluo/telepresence/related-packages/DeepV2D/deepv2d/utils/einsum.py:49) = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](motion/PnP/einsum_1/Reshape, motion/PnP/einsum_1/Reshape_1)]]
[[{{node motion/PnP_2/einsum_7/Reshape_2/_2363}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_5308_motion/PnP_2/einsum_7/Reshape_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
My environment setup is python 3.6.7, tensorflow-gpu 1.12.0 Seems that the problem is the batch size is too big. I have success when I only use 4 images. Can you help?
Looks like cuda error, I don't think batch size should matter in this case. What GPU are you using?
I also have this proiblem ,my cuda is 9.0,tensorflow 1.12.0,how to slove?
@zachteed same problem, do you have any solution? Or which CUDA version is required?
Same issue for me. After some googling, it seems to have something to do with the special combination of TensorFlow 1.12 + RTX 2080. So after upgrading TensorFlow from 1.12.0 to 1.14.0 along with CUDA 10.0, it finally works for me :)
Same issue for me. After some googling, it seems to have something to do with the special combination of TensorFlow 1.12 + RTX 2080. So after upgrading TensorFlow from 1.12.0 to 1.14.0 along with CUDA 10.0, it finally works for me :) works for me too, many thanks!