How to use multiple GPU device?
When I use dokcer image "opsh2oai/h2o-deepwater" and start by following script:
sudo nvidia-docker run -itd \
-p 54321:54321 \
--net=hadoop2 \
--name h2o-dw-single \
--hostname h2o-dw-single \
opsh2oai/h2o-deepwater &> /dev/null
And start H2O service use:
java -jar /opt/h2o.jar
I try to use MNIST dataset to build DeepWater model allow GPU used and setting device_id=0,1,2,3 (By using H2O Flow).
But only one GPU device be used (device id '0').
How can I use multi device?
When no job running:
$ nvidia-smi
Mon Oct 23 10:40:29 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90 Driver Version: 384.90 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN Xp Off | 00000000:05:00.0 Off | N/A |
| 25% 44C P8 18W / 250W | 249MiB / 12187MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 TITAN Xp Off | 00000000:06:00.0 Off | N/A |
| 25% 44C P8 18W / 250W | 173MiB / 12189MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 TITAN Xp Off | 00000000:09:00.0 Off | N/A |
| 24% 44C P8 18W / 250W | 173MiB / 12189MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 TITAN Xp Off | 00000000:0A:00.0 Off | N/A |
| 23% 36C P8 17W / 250W | 173MiB / 12189MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Building DeepWater model:
$nvidia-smi
Mon Oct 23 10:42:09 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90 Driver Version: 384.90 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN Xp Off | 00000000:05:00.0 Off | N/A |
| 25% 46C P2 61W / 250W | 249MiB / 12187MiB | 29% Default |
+-------------------------------+----------------------+----------------------+
| 1 TITAN Xp Off | 00000000:06:00.0 Off | N/A |
| 25% 46C P2 61W / 250W | 173MiB / 12189MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 TITAN Xp Off | 00000000:09:00.0 Off | N/A |
| 24% 45C P2 60W / 250W | 173MiB / 12189MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 TITAN Xp Off | 00000000:0A:00.0 Off | N/A |
| 23% 38C P2 60W / 250W | 173MiB / 12189MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
@aspigex which network are you building? If you're not specifying the network type explicitly then the type will be inferred automatically (based on the input data format). You should be able to find it in the logs (lenet, alexnet, MLP, resnet, inception, vgg).
Currently only lenet is implemented in a multi-GPU fashion. For other networks you'd have to either contribute to deepwater by rewriting the network definition files (found here https://github.com/h2oai/deepwater/tree/master/tensorflow/src/main/resources/deepwater/models) to use multiple GPUs (like this one https://github.com/h2oai/deepwater/blob/master/tensorflow/src/main/resources/deepwater/models/lenet.py).
Or you can write your own multi-GPU TF network in Python, save the model meta file and use it in DeepWater. Here's an example how to do it https://github.com/h2oai/h2o-3/blob/master/examples/deeplearning/notebooks/deeplearning_tensorflow_cat_dog_mouse_lenet.ipynb (Custom models section).