cvpr2015 icon indicating copy to clipboard operation
cvpr2015 copied to clipboard

How do I know the GPU is being used when I run the Deep Learning ... notebook?

Open nateGeorge opened this issue 9 years ago • 3 comments

I'm trying to run the Deep Learning demo notebook, and it's taking a really long time on the training. It also doesn't look like it's using the GPU. I'm on an Amazon EC2 g2.2xlarge with the NVIDIA Corporation GK104GL [GRID K520](rev a1). I tried some of the solutions here: https://github.com/karpathy/char-rnn/issues/89, like

require 'cunn'
require 'cutorch'

and th -l cutorch and th -l cunn from the command line. However, when I run the line

trainer:train(trainset)

it just seems to sit there in progress and doesn't go anywhere. I also checked the GPU usage with nvidia-smi, and it looks like this:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 361.77                 Driver Version: 361.77                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GRID K520           Off  | 0000:00:03.0     Off |                  N/A |
| N/A   31C    P8    26W / 125W |    121MiB /  4036MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      7379    C   /home/ubuntu/torch/install/bin/luajit          119MiB |
+-----------------------------------------------------------------------------+

It jumps up in memory usage and starts the PID after require cutorch, and the memory usage never increases after that. GPU-Util sits at 0%. I have CUDA installed; nvcc --version gives:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Wed_May__4_21:01:56_CDT_2016
Cuda compilation tools, release 8.0, V8.0.26

It's running on Ubuntu 16.04. I verified the samples are working, and CUDA isn't giving any errors. Any ideas why it wouldn't be using the GPU?

nateGeorge avatar Sep 21 '16 05:09 nateGeorge

+1

pankajkumar avatar Oct 12 '17 12:10 pankajkumar

Are you sure you convert your network and criterion into cuda.

mhmtsarigul avatar Oct 12 '17 13:10 mhmtsarigul

nvidia-smi returns the stats when fired. Put it in loop or 'watch -n nvidia-smi', if not already tried.

TheRum avatar Apr 21 '18 13:04 TheRum