mmvec Error running on GPU: device renaming issue?

Error running on GPU: device renaming issue?

Open FranckLejzerowicz opened this issue 5 years ago • 2 comments

Hi,

So here's a command run on a gpu node in an interactiove slurm srun session:

$ rhapsody mmvec \
   --microbe-file A.biom \
   --metabolite-file B.biom  \
   --min-feature-count 5  \
   --epochs 20000 \
   --batch-size 1000  \
   --latent-dim 3  \
   --input-prior 1  \
   --learning-rate 1e-4  \
   --beta1 0.85 \
   --beta2 0.90  \
   --checkpoint-interval 60  \
   --summary-interval 60 \
   --arm-the-gpu  \
   --summary-dir gpu_1000_1e-4_20000  \
   --ranks-file gpu_1000_1e-4_20000/ranks.csv

The (long) error (sorry):


WARNING: Logging before flag parsing goes to stderr.
W0828 12:38:30.259999 140077172123456 deprecation_wrapper.py:119] From /home/flejzerowicz/rhapsody_ve_new/bin/rhapsody:156: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

W0828 12:38:30.262325 140077172123456 deprecation_wrapper.py:119] From /home/flejzerowicz/rhapsody_ve_new/bin/rhapsody:157: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2019-08-28 12:38:30.262596: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-08-28 12:38:30.273506: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-08-28 12:38:32.273961: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x560b1e030b60 executing computations on platform CUDA. Devices:
2019-08-28 12:38:32.274039: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Tesla V100-PCIE-32GB, Compute Capability 7.0
2019-08-28 12:38:32.291287: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2100000000 Hz
2019-08-28 12:38:32.294314: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x560b1d6caf10 executing computations on platform Host. Devices:
2019-08-28 12:38:32.294405: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-08-28 12:38:32.297357: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla V100-PCIE-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.38
pciBusID: 0000:5e:00.0
2019-08-28 12:38:32.298520: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm-18.08.0/lib::/home/flejzerowicz/local/lib:/home/flejzerowicz/local/lib64:/home/flejzerowicz/openssl/lib:/home/flejzerowicz/usr/lib/lib/:/home/flejzerowicz/local/lib:/home/flejzerowicz/local/lib64
2019-08-28 12:38:32.299494: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm-18.08.0/lib::/home/flejzerowicz/local/lib:/home/flejzerowicz/local/lib64:/home/flejzerowicz/openssl/lib:/home/flejzerowicz/usr/lib/lib/:/home/flejzerowicz/local/lib:/home/flejzerowicz/local/lib64
2019-08-28 12:38:32.300329: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm-18.08.0/lib::/home/flejzerowicz/local/lib:/home/flejzerowicz/local/lib64:/home/flejzerowicz/openssl/lib:/home/flejzerowicz/usr/lib/lib/:/home/flejzerowicz/local/lib:/home/flejzerowicz/local/lib64
2019-08-28 12:38:32.301209: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm-18.08.0/lib::/home/flejzerowicz/local/lib:/home/flejzerowicz/local/lib64:/home/flejzerowicz/openssl/lib:/home/flejzerowicz/usr/lib/lib/:/home/flejzerowicz/local/lib:/home/flejzerowicz/local/lib64
2019-08-28 12:38:32.302105: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm-18.08.0/lib::/home/flejzerowicz/local/lib:/home/flejzerowicz/local/lib64:/home/flejzerowicz/openssl/lib:/home/flejzerowicz/usr/lib/lib/:/home/flejzerowicz/local/lib:/home/flejzerowicz/local/lib64
2019-08-28 12:38:32.302962: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm-18.08.0/lib::/home/flejzerowicz/local/lib:/home/flejzerowicz/local/lib64:/home/flejzerowicz/openssl/lib:/home/flejzerowicz/usr/lib/lib/:/home/flejzerowicz/local/lib:/home/flejzerowicz/local/lib64
2019-08-28 12:38:32.304020: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm-18.08.0/lib::/home/flejzerowicz/local/lib:/home/flejzerowicz/local/lib64:/home/flejzerowicz/openssl/lib:/home/flejzerowicz/usr/lib/lib/:/home/flejzerowicz/local/lib:/home/flejzerowicz/local/lib64
2019-08-28 12:38:32.304122: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2019-08-28 12:38:32.304182: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-08-28 12:38:32.304231: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-08-28 12:38:32.304265: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
W0828 12:38:32.641206 140077172123456 deprecation_wrapper.py:119] From /home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/rhapsody/multimodal.py:94: The name tf.log is deprecated. Please use tf.math.log instead.

W0828 12:38:32.643565 140077172123456 deprecation.py:323] From /home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/rhapsody/multimodal.py:95: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.random.categorical` instead.
W0828 12:38:32.655179 140077172123456 deprecation_wrapper.py:119] From /home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/rhapsody/multimodal.py:106: The name tf.random_normal is deprecated. Please use tf.random.normal instead.

W0828 12:38:32.694295 140077172123456 deprecation.py:323] From /home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/rhapsody/multimodal.py:122: Normal.__init__ (from tensorflow.python.ops.distributions.normal) is deprecated and will be removed after 2019-01-01.
Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
W0828 12:38:32.695811 140077172123456 deprecation.py:323] From /home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/tensorflow/python/ops/distributions/normal.py:160: Distribution.__init__ (from tensorflow.python.ops.distributions.distribution) is deprecated and will be removed after 2019-01-01.
Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
W0828 12:38:32.724381 140077172123456 deprecation.py:323] From /home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/rhapsody/multimodal.py:139: Multinomial.__init__ (from tensorflow.python.ops.distributions.multinomial) is deprecated and will be removed after 2019-01-01.
Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
W0828 12:38:32.802299 140077172123456 deprecation_wrapper.py:119] From /home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/rhapsody/multimodal.py:187: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.

W0828 12:38:32.805364 140077172123456 deprecation_wrapper.py:119] From /home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/rhapsody/multimodal.py:189: The name tf.summary.histogram is deprecated. Please use tf.compat.v1.summary.histogram instead.

W0828 12:38:32.810857 140077172123456 deprecation_wrapper.py:119] From /home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/rhapsody/multimodal.py:193: The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead.

W0828 12:38:32.812450 140077172123456 deprecation_wrapper.py:119] From /home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/rhapsody/multimodal.py:195: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

W0828 12:38:32.851014 140077172123456 deprecation_wrapper.py:119] From /home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/rhapsody/multimodal.py:200: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

W0828 12:38:33.204426 140077172123456 deprecation.py:323] From /home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/tensorflow/python/ops/clip_ops.py:286: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0828 12:38:33.331943 140077172123456 deprecation_wrapper.py:119] From /home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/rhapsody/multimodal.py:210: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.

Traceback (most recent call last):
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1339, in _run_fn
    self._extend_graph()
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1374, in _extend_graph
    tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation random_normal/RandomStandardNormal: {{node random_normal/RandomStandardNormal}}was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0 ]. Make sure the device specification refers to a valid device.
	 [[random_normal/RandomStandardNormal]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/flejzerowicz/rhapsody_ve_new/bin/rhapsody", line 221, in <module>
    rhapsody()
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/flejzerowicz/rhapsody_ve_new/bin/rhapsody", line 168, in mmvec
    test_microbes_coo, test_metabolites_df.values)
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/rhapsody/multimodal.py", line 210, in __call__
    tf.global_variables_initializer().run()
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2679, in run
    _run_using_default_session(self, feed_dict, self.graph, session)
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 5614, in _run_using_default_session
    session.run(operation, feed_dict)
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/home/flejzerowicz/rhapsody_ve_new/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation random_normal/RandomStandardNormal: node random_normal/RandomStandardNormal (defined at /lib/python3.6/site-packages/rhapsody/multimodal.py:106) was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0, /job:localhost/replica:0/task:0/device:XLA_GPU:0 ]. Make sure the device specification refers to a valid device.
	 [[random_normal/RandomStandardNormal]]

Note the maybe relevant sinfo

$ sinfo -p gpu -N -o "%c %D %G %m %P"

CPUS NODES GRES MEMORY PARTITION
32 1 gpu:1 94208 gpu
32 1 gpu:1 94208 gpu

Any help greatly appreciated :) Thanks! Franck

Aug 28 '19 19:08 FranckLejzerowicz

Hi @FranckLejzerowicz , just to confirm, have you been able to run nvidia-smi? That can help to see if there are GPUs available. It looks like there is one GPU being recognized - not sure what that isn't being properly loaded.

Aug 29 '19 15:08 mortonjt

Hi @FranckLejzerowicz, there are two problems with the tensorflow-gpu setup

Tensorflow-gpu must be installed independently of tensorflow.

To do this, you'd need the following

pip uninstall tensorflow
pip uninstall tensorflow-gpu
pip install tensorflow-gpu --upgrade

You need the right libraries linked to your GPU - so you'd need to module load cuda and cudaDNN on your cluster (I'm using cuda 10 and cudadnn v7.6.2)

Below are a couple of commands that I would print for debugging inside python

import tensorflow as tf
tf.test.gpu_device_name()

and

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

Here is some of the output from my setup

>>> from tensorflow.python.client import device_lib
>>> tf.test.gpu_device_name()
2019-09-05 09:30:33.371294: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-09-05 09:30:33.399783: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1

2019-09-05 09:30:33.616269: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x38b4a60 executing computations on platform CUDA. Devices:
2019-09-05 09:30:33.616301: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Tesla V100-PCIE-32GB, Compute Capability 7.0
2019-09-05 09:30:33.618859: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2400000000 Hz
2019-09-05 09:30:33.620264: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3928a30 executing computations on platform Host. Devices:
2019-09-05 09:30:33.620282: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-09-05 09:30:33.621944: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: Tesla V100-PCIE-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.38
pciBusID: 0000:06:00.0
2019-09-05 09:30:33.622771: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-09-05 09:30:33.624800: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-09-05 09:30:33.626704: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-09-05 09:30:33.627375: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-09-05 09:30:33.629704: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-09-05 09:30:33.631585: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-09-05 09:30:33.635094: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-09-05 09:30:33.638320: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-09-05 09:30:33.638422: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-09-05 09:30:33.641755: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-05 09:30:33.641837: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0
2019-09-05 09:30:33.641895: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N
2019-09-05 09:30:33.645848: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/device:GPU:0 with 30555 MB memory) -> physical GPU (device: 0, name: Tesla V100-PCIE-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0)
'/device:GPU:0'
>>>
>>> from tensorflow.python.client import device_lib
>>> print(device_lib.list_local_devices())
2019-09-05 09:31:25.908226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: Tesla V100-PCIE-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.38
pciBusID: 0000:06:00.0
2019-09-05 09:31:25.908324: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-09-05 09:31:25.908358: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-09-05 09:31:25.908389: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-09-05 09:31:25.908419: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-09-05 09:31:25.908449: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-09-05 09:31:25.908479: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-09-05 09:31:25.908509: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-09-05 09:31:25.918502: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-09-05 09:31:25.918541: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-05 09:31:25.918551: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0
2019-09-05 09:31:25.918559: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N
2019-09-05 09:31:25.923547: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/device:GPU:0 with 30555 MB memory) -> physical GPU (device: 0, name: Tesla V100-PCIE-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 15270850500731088615
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 17617425747417705410
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 13070795884554441190
physical_device_desc: "device: XLA_CPU device"
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 32039642727
locality {
  bus_id: 1
  links {
  }
}
incarnation: 6846769415979563337
physical_device_desc: "device: 0, name: Tesla V100-PCIE-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0"
]

>>> tf.test.gpu_device_name()
2019-09-05 09:30:33.371294: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-09-05 09:30:33.399783: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1

2019-09-05 09:30:33.616269: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x38b4a60 executing computations on platform CUDA. Devices:
2019-09-05 09:30:33.616301: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Tesla V100-PCIE-32GB, Compute Capability 7.0
2019-09-05 09:30:33.618859: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2400000000 Hz
2019-09-05 09:30:33.620264: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3928a30 executing computations on platform Host. Devices:
2019-09-05 09:30:33.620282: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-09-05 09:30:33.621944: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: Tesla V100-PCIE-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.38
pciBusID: 0000:06:00.0
2019-09-05 09:30:33.622771: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-09-05 09:30:33.624800: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-09-05 09:30:33.626704: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-09-05 09:30:33.627375: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-09-05 09:30:33.629704: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-09-05 09:30:33.631585: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-09-05 09:30:33.635094: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-09-05 09:30:33.638320: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-09-05 09:30:33.638422: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-09-05 09:30:33.641755: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-05 09:30:33.641837: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0
2019-09-05 09:30:33.641895: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N
2019-09-05 09:30:33.645848: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/device:GPU:0 with 30555 MB memory) -> physical GPU (device: 0, name: Tesla V100-PCIE-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0)
'/device:GPU:0'

Sep 03 '19 17:09 mortonjt

mmvec mmvec copied to clipboard

Error running on GPU: device renaming issue?

mmvec
mmvec copied to clipboard