blocksparse icon indicating copy to clipboard operation
blocksparse copied to clipboard

Tesla K80 Compilation

Open xingyousong opened this issue 6 years ago • 3 comments

Hi, when I try to run the test code on an Amazon EC2 instance (P2 instances have Nvidia Tesla K80's, which are Kepler architectures), it gives me the following error:

(manarprojenv) ubuntu@ip-172-31-47-48:~/scratch_manar$ python blocksparse_scripy.py 2018-04-21 07:30:01.823848: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2018-04-21 07:30:01.920509: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2018-04-21 07:30:01.920894: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235 pciBusID: 0000:00:1e.0 totalMemory: 11.17GiB freeMemory: 11.10GiB 2018-04-21 07:30:01.920929: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0 2018-04-21 07:30:02.235420: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-04-21 07:30:02.235492: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0 2018-04-21 07:30:02.235501: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N 2018-04-21 07:30:02.235863: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10761 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7) 2018-04-21 07:30:02.251499: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at blocksparse_matmul_op.cc:208 : Internal: invalid resource handle Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1327, in _do_call return fn(*args) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1312, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1420, in _call_tf_sessionrun status, run_metadata) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InternalError: invalid resource handle [[Node: BlocksparseMatMul_000000 = BlocksparseMatmul[C=4096, K=4096, alpha=1, axis=1, bench=0, beta=0, blocks=8246, bshift=5, dtype_dw=DT_FLOAT, dtype_w=DT_FLOAT, dtype_x=DT_FLOAT, dtype_y=DT_FLOAT, locks=0, locks_dx=0, segments=128, segments_dx=128, shared=624, shared_dx=624, _device="/job:localhost/replica:0/task:0/device:GPU:0"](_arg_Placeholder_0_0/_1, w/read, BlocksparseMatMul/fprop_lut/_3, BlocksparseMatMul/bprop_lut/_5, BlocksparseMatMul/updat_lut/_7)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "blocksparse_scripy.py", line 27, in result = sess.run([y], feed_dict = {x: np.ones((minibatch_size,hidden_size), dtype='float32')}) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 905, in run run_metadata_ptr) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1140, in _run feed_dict_tensor, options, run_metadata) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run run_metadata) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InternalError: invalid resource handle [[Node: BlocksparseMatMul_000000 = BlocksparseMatmul[C=4096, K=4096, alpha=1, axis=1, bench=0, beta=0, blocks=8246, bshift=5, dtype_dw=DT_FLOAT, dtype_w=DT_FLOAT, dtype_x=DT_FLOAT, dtype_y=DT_FLOAT, locks=0, locks_dx=0, segments=128, segments_dx=128, shared=624, shared_dx=624, _device="/job:localhost/replica:0/task:0/device:GPU:0"](_arg_Placeholder_0_0/_1, w/read, BlocksparseMatMul/fprop_lut/_3, BlocksparseMatMul/bprop_lut/_5, BlocksparseMatMul/updat_lut/_7)]]

Caused by op 'BlocksparseMatMul_000000', defined at: File "blocksparse_scripy.py", line 22, in y = bsmm(x, w) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/site-packages/blocksparse/matmul.py", line 383, in call shared=self.fprop_shared, shared_dx=self.bprop_shared, bench=bench, name=name File "", line 650, in blocksparse_matmul File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 328, in apply_op op_type_name, name, **keywords) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3290, in create_op op_def=op_def) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1654, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InternalError (see above for traceback): invalid resource handle [[Node: BlocksparseMatMul_000000 = BlocksparseMatmul[C=4096, K=4096, alpha=1, axis=1, bench=0, beta=0, blocks=8246, bshift=5, dtype_dw=DT_FLOAT, dtype_w=DT_FLOAT, dtype_x=DT_FLOAT, dtype_y=DT_FLOAT, locks=0, locks_dx=0, segments=128, segments_dx=128, shared=624, shared_dx=624, _device="/job:localhost/replica:0/task:0/device:GPU:0"](_arg_Placeholder_0_0/_1, w/read, BlocksparseMatMul/fprop_lut/_3, BlocksparseMatMul/bprop_lut/_5, BlocksparseMatMul/updat_lut/_7)]]

(manarprojenv) ubuntu@ip-172-31-47-48:~/scratch_manar$

I understand that this probably is an issue with the flags in the installation, but I'm not sure what flags I should use at all. I've tried following online and adding correct gencodes: http://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/ but nothing seems to work for the Tesla K80. Can you give a set of flags that work for older cards? Should I also change the major, minor = 3,7 as well?

Thanks so much!

xingyousong avatar Apr 21 '18 08:04 xingyousong

The feature axis=0 kernels are written in assembly and only work on maxwel/pascal cards. The cuda-c kernels which operate on feature axis=1 should work on all cards.

On Sat, Apr 21, 2018 at 1:28 AM, Xingyou Song [email protected] wrote:

Hi, when I try to run the test code on an Amazon EC2 instance (P2 instances have Nvidia Tesla K80's, which are Kepler architectures), it gives me the following error:

(manarprojenv) ubuntu@ip-172-31-47-48:~/scratch_manar$ python blocksparse_scripy.py 2018-04-21 07:30:01.823848: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2018-04-21 07:30:01.920509: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2018-04-21 07:30:01.920894: I tensorflow/core/common_ runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235 pciBusID: 0000:00:1e.0 totalMemory: 11.17GiB freeMemory: 11.10GiB 2018-04-21 07:30:01.920929: I tensorflow/core/common_ runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0 2018-04-21 07:30:02.235420: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-04-21 07:30:02.235492: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0 2018-04-21 07:30:02.235501: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N 2018-04-21 07:30:02.235863: I tensorflow/core/common_ runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10761 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7) 2018-04-21 07:30:02.251499: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at blocksparse_matmul_op.cc:208 : Internal: invalid resource handle Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/ site-packages/tensorflow/python/client/session.py", line 1327, in _do_call return fn(*args) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/ site-packages/tensorflow/python/client/session.py", line 1312, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/ site-packages/tensorflow/python/client/session.py", line 1420, in _call_tf_sessionrun status, run_metadata) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/ site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InternalError: invalid resource handle [[Node: BlocksparseMatMul_000000 = BlocksparseMatmul[C=4096, K=4096, alpha=1, axis=1, bench=0, beta=0, blocks=8246, bshift=5, dtype_dw=DT_FLOAT, dtype_w=DT_FLOAT, dtype_x=DT_FLOAT, dtype_y=DT_FLOAT, locks=0, locks_dx=0, segments=128, segments_dx=128, shared=624, shared_dx=624, _device="/job:localhost/replica:0/task:0/device:GPU:0"](_arg_Placeholder_0_0/_1, w/read, BlocksparseMatMul/fprop_lut/_3, BlocksparseMatMul/bprop_lut/_5, BlocksparseMatMul/updat_lut/_7)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "blocksparse_scripy.py", line 27, in result = sess.run([y], feed_dict = {x: np.ones((minibatch_size,hidden_size), dtype='float32')}) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/ site-packages/tensorflow/python/client/session.py", line 905, in run run_metadata_ptr) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/ site-packages/tensorflow/python/client/session.py", line 1140, in _run feed_dict_tensor, options, run_metadata) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/ site-packages/tensorflow/python/client/session.py", line 1321, in _do_run run_metadata) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/ site-packages/tensorflow/python/client/session.py", line 1340, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InternalError: invalid resource handle [[Node: BlocksparseMatMul_000000 = BlocksparseMatmul[C=4096, K=4096, alpha=1, axis=1, bench=0, beta=0, blocks=8246, bshift=5, dtype_dw=DT_FLOAT, dtype_w=DT_FLOAT, dtype_x=DT_FLOAT, dtype_y=DT_FLOAT, locks=0, locks_dx=0, segments=128, segments_dx=128, shared=624, shared_dx=624, _device="/job:localhost/replica:0/task:0/device:GPU:0"](_arg_Placeholder_0_0/_1, w/read, BlocksparseMatMul/fprop_lut/_3, BlocksparseMatMul/bprop_lut/_5, BlocksparseMatMul/updat_lut/_7)]]

Caused by op 'BlocksparseMatMul_000000', defined at: File "blocksparse_scripy.py", line 22, in y = bsmm(x, w) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/ site-packages/blocksparse/matmul.py", line 383, in call shared=self.fprop_shared, shared_dx=self.bprop_shared, bench=bench, name=name File "", line 650, in blocksparse_matmul File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/ site-packages/tensorflow/python/framework/op_def_library.py", line 328, in apply_op op_type_name, name, **keywords) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/ site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/ site-packages/tensorflow/python/framework/ops.py", line 3290, in create_op op_def=op_def) File "/home/ubuntu/anaconda3/envs/manarprojenv/lib/python3.6/ site-packages/tensorflow/python/framework/ops.py", line 1654, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InternalError (see above for traceback): invalid resource handle [[Node: BlocksparseMatMul_000000 = BlocksparseMatmul[C=4096, K=4096, alpha=1, axis=1, bench=0, beta=0, blocks=8246, bshift=5, dtype_dw=DT_FLOAT, dtype_w=DT_FLOAT, dtype_x=DT_FLOAT, dtype_y=DT_FLOAT, locks=0, locks_dx=0, segments=128, segments_dx=128, shared=624, shared_dx=624, _device="/job:localhost/replica:0/task:0/device:GPU:0"](_arg_Placeholder_0_0/_1, w/read, BlocksparseMatMul/fprop_lut/_3, BlocksparseMatMul/bprop_lut/_5, BlocksparseMatMul/updat_lut/_7)]]

(manarprojenv) ubuntu@ip-172-31-47-48:~/scratch_manar$

I understand that this probably is an issue with the flags in the installation, but I'm not sure what flags I should use at all. I've tried following online and adding correct gencodes: http://arnon.dk/matching-sm- architectures-arch-and-gencode-for-various-nvidia-cards/ but nothing seems to work for the Tesla K80. Can you give a set of flags that work for older cards?

Thanks so much!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openai/blocksparse/issues/12, or mute the thread https://github.com/notifications/unsubscribe-auth/AI5RcE2fsWzcV1eQ_jUOeHLFMjZ1nkygks5tqu20gaJpZM4TeXpB .

scott-gray avatar Apr 23 '18 17:04 scott-gray

Thanks for the reply. Where should I change this "feature axis" part? And should I still make the major, minor = 3,7 to follow the K80 architcture?

xingyousong avatar Apr 23 '18 17:04 xingyousong

If you search the lstm example code for the hps.axis hyper param you can see all the places where that is accounted for in that model. You can add a "-gencode=arch=compute_37,code=sm_37" to the makefile if that helps. But I think 3_7 is backwards compatible with 3_5.

On Mon, Apr 23, 2018 at 10:17 AM, Xingyou Song [email protected] wrote:

Thanks for the reply. Where should I change this "feature axis" part? And should I still make the major, minor = 3,7 to follow the K80 architcture?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/openai/blocksparse/issues/12#issuecomment-383652957, or mute the thread https://github.com/notifications/unsubscribe-auth/AI5RcIEkIGyk0HZBloFhwPGGu0OZUVjuks5trgyagaJpZM4TeXpB .

scott-gray avatar Apr 23 '18 17:04 scott-gray