tacotron Low GPU usage

I train the model on my two Tesla M40 GPU. When I use nvidia-smi command to check the GPU usage, it always keeps on a low usage, and just one GPU is used.

How can I full use the two GPUs?

I've tried to increase the queue capacity and thread numbers, but it helps little.

May 31 '17 10:05 candlewill

@candlewill Are you running it with Python 3 or which Python version? I had the problem that I couldn't use the GPUs and that was, I'm guessing, because I was using Python 2.7, it is in the closed issue #5 .

Try to see if you can find something useful there. Otherwise, if you can solve it, please, let us know how you did.

May 31 '17 14:05 basuam

@basuam The Python I used is Python 3.6.0 with Anaconda 4.3.1 (64-bit), and GPU version TensorFlow (1.1) is used.

When training, the two GPUs are used, but just one for computation.

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    2     19759    C   /home/train01/heyunchao/anaconda3/bin/python 21912MiB |
|    3     19759    C   /home/train01/heyunchao/anaconda3/bin/python 21794MiB |
+-----------------------------------------------------------------------------+

Only GPU 2 is used for computation, and the GPU-Util maintains at 0% in a long time.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39                 Driver Version: 375.39                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla M40 24GB      On   | 0000:02:00.0     Off |                    0 |
| N/A   19C    P8    18W / 250W |      0MiB / 22939MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla M40 24GB      On   | 0000:03:00.0     Off |                    0 |
| N/A   21C    P8    17W / 250W |      0MiB / 22939MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla M40 24GB      On   | 0000:83:00.0     Off |                    0 |
| N/A   37C    P0    65W / 250W |  21916MiB / 22939MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla M40 24GB      On   | 0000:84:00.0     Off |                    0 |
| N/A   32C    P0    57W / 250W |  21798MiB / 22939MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

Jun 01 '17 01:06 candlewill

@candlewill I believe there should be a slight modification in the code to allocate both GPUs because it's not the default for tensorflow. I cannot confirm because I've never trained it with more than one GPU but I do believe you must allocate both first manually otherwise tensorflow-gpu simply allocate one GPU. Have you trained other networks before without explicitly declaring which GPUs to use and checked if it has used them all?

Jun 01 '17 09:06 marcossilva

To my understanding, maybe this is the reason: If multiple GPUs are not explicitly declared how to allocate, TensorFlow would choose the first GPU for computation as default, but use the memory of all GPUs.

Jun 01 '17 09:06 candlewill

candlewill's explanation is exact. I added train_multi_gpus.py for using multiple gpus. @basuam @candlewill Would you run and check the file? In my environment (3 * gtx 1080), the time for an epoch has decreased to almost 1/3. But I'm not sure if it's error-free because this is the first time I've written a code for multiple gpus.

Jun 02 '17 06:06 Kyubyong

@Kyubyong In the current train.py code, the training is completely on CPU (see here ). I commented this line to allow use one GPU.

Then, I tried to compare the time cost of one epoch between train_multi_gpus.py and train.py. I find that, multi GPUs verstion takes a longer time about 220 seconds per epoch, while the single GPU version takes about 110 seconds.

My experiment environment is four Tesla K40m GPUs.

Jun 02 '17 08:06 candlewill

Did you run train_multi_gpus.py?

Jun 02 '17 08:06 Kyubyong

Yes, It takes a longer time about 220 seconds per epoch.

Jun 02 '17 08:06 candlewill

You changed the value of num_gpus in the hyperparams.py, did you?

Jun 02 '17 09:06 Kyubyong

Yes, I changed the value into 4.

Jun 02 '17 09:06 candlewill

One possibility is the batch size. If you have 4 gpus, you have to multiply the hp.batch_size by 4 for a fair comparison. If you see the code, mini-batch samples are split into 4 so each is fed in each gpu tower.

Jun 02 '17 10:06 Kyubyong

@candlewill Oh, and I removed the line of tf.device('/cpu:0'). I forgot to remove it. Thanks.

Jun 02 '17 10:06 Kyubyong

@candlewill Did you find why the multi gpu version is slower than the single gpu one? For me, the former is definitely way faster than the latter.

Jun 07 '17 10:06 Kyubyong

I forgot to increase the value of batch_size by multiplying with num_gpus when training.

Jun 07 '17 10:06 candlewill

I am running in a similar issue: GPU usage (AWS p2.xlarge, Deep Learning CUDA 9 Ubuntu AMI) is on average low, while CPU usage is always at peak. Using python 2.7 or 3 doesn't make any difference. It's like the GPU is used for some particular task only which is seldomly invoked.

ubuntu@ip-172-31-13-191:~$ nvidia-smi
Sat Nov  4 08:20:19 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81                 Driver Version: 384.81                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 00000000:00:1E.0 Off |                    0 |
| N/A   75C    P0    93W / 149W |  10984MiB / 11439MiB |     50%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     10238      C   python3                                    10971MiB |
+-----------------------------------------------------------------------------+
ubuntu@ip-172-31-13-191:~$ nvidia-smi
Sat Nov  4 08:20:24 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81                 Driver Version: 384.81                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 00000000:00:1E.0 Off |                    0 |
| N/A   75C    P0    73W / 149W |  10984MiB / 11439MiB |      4%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     10238      C   python3                                    10971MiB |
+-----------------------------------------------------------------------------+
ubuntu@ip-172-31-13-191:~$

.

Nov 04 '17 08:11 aijanai

@candlewill - I am facing same issue of low GPU usage as indicated by @aijanai . Could you please indicate which line did you comment to get full utilization on single GPU? The line which is mentioned by you in your comment is already commented in the code as it stands now and yet the performance is slow, hence the confusion.

Mar 22 '18 15:03 learningneo

candlewill's explanation is exact. I added train_multi_gpus.py for using multiple gpus. @basuam @candlewill Would you run and check the file? In my environment (3 * gtx 1080), the time for an epoch has decreased to almost 1/3. But I'm not sure if it's error-free because this is the first time I've written a code for multiple gpus.

Can you please share train_multi_gpus.py.? It is not available now. And with train.py, I cant train on the GPUs.

Apr 01 '20 01:04 giridhar-pamisetty

tacotron tacotron copied to clipboard

Low GPU usage

tacotron
tacotron copied to clipboard