blocksparse
blocksparse copied to clipboard
Tesla K80 Compilation
Hi, when I try to run the test code on an Amazon EC2 instance (P2 instances have Nvidia Tesla K80's, which are Kepler architectures), it gives me the following error:
(manarprojenv) ubuntu@ip-172-31-47-48:~/scratch_manar$ python blocksparse_scripy.py 2018-04-21 07:30:01.823848: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2018-04-21 07:30:01.920509: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2018-04-21 07:30:01.920894: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235 pciBusID: 0000:00:1e.0 totalMemory: 11.17GiB freeMemory: 11.10GiB 2018-04-21 07:30:01.920929: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0 2018-04-21 07:30:02.235420: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-04-21 07:30:02.235492: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0 2018-04-21 07:30:02.235501: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N 2018-04-21 07:30:02.235863: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10761 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7) 2018-04-21 07:30:02.251499: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at blocksparse_matmul_op.cc:208 : Internal: invalid resource handle Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1327, in _do_call return fn(*args) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1312, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1420, in _call_tf_sessionrun status, run_metadata) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InternalError: invalid resource handle [[Node: BlocksparseMatMul_000000 = BlocksparseMatmul[C=4096, K=4096, alpha=1, axis=1, bench=0, beta=0, blocks=8246, bshift=5, dtype_dw=DT_FLOAT, dtype_w=DT_FLOAT, dtype_x=DT_FLOAT, dtype_y=DT_FLOAT, locks=0, locks_dx=0, segments=128, segments_dx=128, shared=624, shared_dx=624, _device="/job:localhost/replica:0/task:0/device:GPU:0"](_arg_Placeholder_0_0/_1, w/read, BlocksparseMatMul/fprop_lut/_3, BlocksparseMatMul/bprop_lut/_5, BlocksparseMatMul/updat_lut/_7)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "blocksparse_scripy.py", line 27, in
Caused by op 'BlocksparseMatMul_000000', defined at:
File "blocksparse_scripy.py", line 22, in
InternalError (see above for traceback): invalid resource handle [[Node: BlocksparseMatMul_000000 = BlocksparseMatmul[C=4096, K=4096, alpha=1, axis=1, bench=0, beta=0, blocks=8246, bshift=5, dtype_dw=DT_FLOAT, dtype_w=DT_FLOAT, dtype_x=DT_FLOAT, dtype_y=DT_FLOAT, locks=0, locks_dx=0, segments=128, segments_dx=128, shared=624, shared_dx=624, _device="/job:localhost/replica:0/task:0/device:GPU:0"](_arg_Placeholder_0_0/_1, w/read, BlocksparseMatMul/fprop_lut/_3, BlocksparseMatMul/bprop_lut/_5, BlocksparseMatMul/updat_lut/_7)]]
(manarprojenv) ubuntu@ip-172-31-47-48:~/scratch_manar$
I understand that this probably is an issue with the flags in the installation, but I'm not sure what flags I should use at all. I've tried following online and adding correct gencodes: http://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/ but nothing seems to work for the Tesla K80. Can you give a set of flags that work for older cards? Should I also change the major, minor = 3,7 as well?
Thanks so much!
The feature axis=0 kernels are written in assembly and only work on maxwel/pascal cards. The cuda-c kernels which operate on feature axis=1 should work on all cards.
On Sat, Apr 21, 2018 at 1:28 AM, Xingyou Song [email protected] wrote:
Hi, when I try to run the test code on an Amazon EC2 instance (P2 instances have Nvidia Tesla K80's, which are Kepler architectures), it gives me the following error:
(manarprojenv) ubuntu@ip-172-31-47-48:~/scratch_manar$ python blocksparse_scripy.py 2018-04-21 07:30:01.823848: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2018-04-21 07:30:01.920509: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2018-04-21 07:30:01.920894: I tensorflow/core/common_ runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235 pciBusID: 0000:00:1e.0 totalMemory: 11.17GiB freeMemory: 11.10GiB 2018-04-21 07:30:01.920929: I tensorflow/core/common_ runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0 2018-04-21 07:30:02.235420: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-04-21 07:30:02.235492: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0 2018-04-21 07:30:02.235501: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N 2018-04-21 07:30:02.235863: I tensorflow/core/common_ runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10761 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7) 2018-04-21 07:30:02.251499: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at blocksparse_matmul_op.cc:208 : Internal: invalid resource handle Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/ site-packages/tensorflow/python/client/session.py", line 1327, in _do_call return fn(*args) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/ site-packages/tensorflow/python/client/session.py", line 1312, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/ site-packages/tensorflow/python/client/session.py", line 1420, in _call_tf_sessionrun status, run_metadata) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/ site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InternalError: invalid resource handle [[Node: BlocksparseMatMul_000000 = BlocksparseMatmul[C=4096, K=4096, alpha=1, axis=1, bench=0, beta=0, blocks=8246, bshift=5, dtype_dw=DT_FLOAT, dtype_w=DT_FLOAT, dtype_x=DT_FLOAT, dtype_y=DT_FLOAT, locks=0, locks_dx=0, segments=128, segments_dx=128, shared=624, shared_dx=624, _device="/job:localhost/replica:0/task:0/device:GPU:0"](_arg_Placeholder_0_0/_1, w/read, BlocksparseMatMul/fprop_lut/_3, BlocksparseMatMul/bprop_lut/_5, BlocksparseMatMul/updat_lut/_7)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "blocksparse_scripy.py", line 27, in result = sess.run([y], feed_dict = {x: np.ones((minibatch_size,hidden_size), dtype='float32')}) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/ site-packages/tensorflow/python/client/session.py", line 905, in run run_metadata_ptr) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/ site-packages/tensorflow/python/client/session.py", line 1140, in _run feed_dict_tensor, options, run_metadata) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/ site-packages/tensorflow/python/client/session.py", line 1321, in _do_run run_metadata) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/ site-packages/tensorflow/python/client/session.py", line 1340, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InternalError: invalid resource handle [[Node: BlocksparseMatMul_000000 = BlocksparseMatmul[C=4096, K=4096, alpha=1, axis=1, bench=0, beta=0, blocks=8246, bshift=5, dtype_dw=DT_FLOAT, dtype_w=DT_FLOAT, dtype_x=DT_FLOAT, dtype_y=DT_FLOAT, locks=0, locks_dx=0, segments=128, segments_dx=128, shared=624, shared_dx=624, _device="/job:localhost/replica:0/task:0/device:GPU:0"](_arg_Placeholder_0_0/_1, w/read, BlocksparseMatMul/fprop_lut/_3, BlocksparseMatMul/bprop_lut/_5, BlocksparseMatMul/updat_lut/_7)]]
Caused by op 'BlocksparseMatMul_000000', defined at: File "blocksparse_scripy.py", line 22, in y = bsmm(x, w) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/ site-packages/blocksparse/matmul.py", line 383, in call shared=self.fprop_shared, shared_dx=self.bprop_shared, bench=bench, name=name File "", line 650, in blocksparse_matmul File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/ site-packages/tensorflow/python/framework/op_def_library.py", line 328, in apply_op op_type_name, name, **keywords) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/ site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/ site-packages/tensorflow/python/framework/ops.py", line 3290, in create_op op_def=op_def) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/ site-packages/tensorflow/python/framework/ops.py", line 1654, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InternalError (see above for traceback): invalid resource handle [[Node: BlocksparseMatMul_000000 = BlocksparseMatmul[C=4096, K=4096, alpha=1, axis=1, bench=0, beta=0, blocks=8246, bshift=5, dtype_dw=DT_FLOAT, dtype_w=DT_FLOAT, dtype_x=DT_FLOAT, dtype_y=DT_FLOAT, locks=0, locks_dx=0, segments=128, segments_dx=128, shared=624, shared_dx=624, _device="/job:localhost/replica:0/task:0/device:GPU:0"](_arg_Placeholder_0_0/_1, w/read, BlocksparseMatMul/fprop_lut/_3, BlocksparseMatMul/bprop_lut/_5, BlocksparseMatMul/updat_lut/_7)]]
(manarprojenv) ubuntu@ip-172-31-47-48:~/scratch_manar$
I understand that this probably is an issue with the flags in the installation, but I'm not sure what flags I should use at all. I've tried following online and adding correct gencodes: http://arnon.dk/matching-sm- architectures-arch-and-gencode-for-various-nvidia-cards/ but nothing seems to work for the Tesla K80. Can you give a set of flags that work for older cards?
Thanks so much!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openai/blocksparse/issues/12, or mute the thread https://github.com/notifications/unsubscribe-auth/AI5RcE2fsWzcV1eQ_jUOeHLFMjZ1nkygks5tqu20gaJpZM4TeXpB .
Thanks for the reply. Where should I change this "feature axis" part? And should I still make the major, minor = 3,7 to follow the K80 architcture?
If you search the lstm example code for the hps.axis hyper param you can see all the places where that is accounted for in that model. You can add a "-gencode=arch=compute_37,code=sm_37" to the makefile if that helps. But I think 3_7 is backwards compatible with 3_5.
On Mon, Apr 23, 2018 at 10:17 AM, Xingyou Song [email protected] wrote:
Thanks for the reply. Where should I change this "feature axis" part? And should I still make the major, minor = 3,7 to follow the K80 architcture?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/openai/blocksparse/issues/12#issuecomment-383652957, or mute the thread https://github.com/notifications/unsubscribe-auth/AI5RcIEkIGyk0HZBloFhwPGGu0OZUVjuks5trgyagaJpZM4TeXpB .