How to use multiple GPU device?

Open aspigex opened this issue 8 years ago • 1 comments

When I use dokcer image "opsh2oai/h2o-deepwater" and start by following script:

sudo nvidia-docker run -itd \
      -p 54321:54321 \
      --net=hadoop2 \
      --name h2o-dw-single \
      --hostname h2o-dw-single \
      opsh2oai/h2o-deepwater &> /dev/null

And start H2O service use:

java -jar /opt/h2o.jar

I try to use MNIST dataset to build DeepWater model allow GPU used and setting device_id=0,1,2,3 (By using H2O Flow).

But only one GPU device be used (device id '0').

How can I use multi device?

When no job running:

$ nvidia-smi 

Mon Oct 23 10:40:29 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN Xp            Off  | 00000000:05:00.0 Off |                  N/A |
| 25%   44C    P8    18W / 250W |    249MiB / 12187MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN Xp            Off  | 00000000:06:00.0 Off |                  N/A |
| 25%   44C    P8    18W / 250W |    173MiB / 12189MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  TITAN Xp            Off  | 00000000:09:00.0 Off |                  N/A |
| 24%   44C    P8    18W / 250W |    173MiB / 12189MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  TITAN Xp            Off  | 00000000:0A:00.0 Off |                  N/A |
| 23%   36C    P8    17W / 250W |    173MiB / 12189MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Building DeepWater model:

$nvidia-smi 

Mon Oct 23 10:42:09 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN Xp            Off  | 00000000:05:00.0 Off |                  N/A |
| 25%   46C    P2    61W / 250W |    249MiB / 12187MiB |     29%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN Xp            Off  | 00000000:06:00.0 Off |                  N/A |
| 25%   46C    P2    61W / 250W |    173MiB / 12189MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  TITAN Xp            Off  | 00000000:09:00.0 Off |                  N/A |
| 24%   45C    P2    60W / 250W |    173MiB / 12189MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  TITAN Xp            Off  | 00000000:0A:00.0 Off |                  N/A |
| 23%   38C    P2    60W / 250W |    173MiB / 12189MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Oct 23 '17 10:10 aspigex

@aspigex which network are you building? If you're not specifying the network type explicitly then the type will be inferred automatically (based on the input data format). You should be able to find it in the logs (lenet, alexnet, MLP, resnet, inception, vgg).

Currently only lenet is implemented in a multi-GPU fashion. For other networks you'd have to either contribute to deepwater by rewriting the network definition files (found here https://github.com/h2oai/deepwater/tree/master/tensorflow/src/main/resources/deepwater/models) to use multiple GPUs (like this one https://github.com/h2oai/deepwater/blob/master/tensorflow/src/main/resources/deepwater/models/lenet.py).

Or you can write your own multi-GPU TF network in Python, save the model meta file and use it in DeepWater. Here's an example how to do it https://github.com/h2oai/h2o-3/blob/master/examples/deeplearning/notebooks/deeplearning_tensorflow_cat_dog_mouse_lenet.ipynb (Custom models section).

Oct 24 '17 08:10 mdymczyk