texture_nets
texture_nets copied to clipboard
invalid device ordinal
I have installed all dependencies that this repository need,but something gose wrong when running the command bellow:
th test.lua -input_image /data/artwork/content/huaban.jpeg -model_t7 data/checkpoints/model.t7 -gpu 0
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-3022/cutorch/init.c line=734 error=10 : invalid device ordinal /root/AI/torch/install/bin/luajit: test.lua:26: cuda runtime error (10) : invalid device ordinal at /tmp/luarocks_cutorch-scm-1-3022/cutorch/init.c:734 stack traceback: [C]: in function 'setDevice' test.lua:26: in main chunk [C]: in function 'dofile' ...t/AI/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00406670 ` bellow is my GPU info
`+-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.26 Driver Version: 375.26 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Quadro P5000 Off | 0000:03:00.0 On | Off | | 26% 39C P8 8W / 180W | 110MiB / 16264MiB | 0% Default | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1387 G /usr/lib/xorg/Xorg 108MiB | +-----------------------------------------------------------------------------+ ` I really have no idea where the problem is.
Hi, I think torch enumerates GPU' from 1. If you have only one GPU you can omit this argument.
On Thu, 22 Jun 2017, 05:58 吴强, [email protected] wrote:
I have installed all dependencies that this repository need,but something gose wrong when running the command bellow:
th test.lua -input_image /data/artwork/content/huaban.jpeg -model_t7 data/checkpoints/model.t7 -gpu 0
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-3022/cutorch/init.c line=734 error=10 : invalid device ordinal /root/AI/torch/install/bin/luajit: test.lua:26: cuda runtime error (10) : invalid device ordinal at /tmp/luarocks_cutorch-scm-1-3022/cutorch/init.c:734 stack traceback: [C]: in function 'setDevice' test.lua:26: in main chunk [C]: in function 'dofile' ...t/AI/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00406670 ` bellow id my GPU info
`+-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.26 Driver Version: 375.26 |
|-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================| | 0 Quadro P5000 Off | 0000:03:00.0 On | Off | | 26% 39C P8 8W / 180W | 110MiB / 16264MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage |
|=============================================================================| | 0 1387 G /usr/lib/xorg/Xorg 108MiB |
+-----------------------------------------------------------------------------+ ` I really have no idea where the problem is.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/DmitryUlyanov/texture_nets/issues/74, or mute the thread https://github.com/notifications/unsubscribe-auth/AGanZC2NPnLy6b5NHRpoNgBQuuVvemmpks5sGdhIgaJpZM4OBwLe .
-- Best, Dmitry
Hi,Dear Dmitry,thank you for your reply.but it still failed when I ignore the -gpu argument.what makes me confused is that the chainer-fast-neuralstyle
implemented with python also has the '-gpu' argument, and it runs well when I set -gpu 0.
Hi , This issue still persist any one found a solution for it
the gpu index starts from 1, pls try to use option -gpu 1 instead of -gpu 0
i also get this error. whatever gpu id i input. cudnn works fine on chainer.
my setup info: ubuntu 16.04 torch7 cuda9.2 cudnn7.1.4 `` +-----------------------------------------------------------------------------+ | NVIDIA-SMI 396.26 Driver Version: 396.26 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 970 Off | 00000000:01:00.0 On | N/A | | 0% 46C P8 17W / 163W | 455MiB / 4040MiB | 1% Default | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 958 G /usr/lib/xorg/Xorg 287MiB | | 0 1897 G compiz 164MiB | +-----------------------------------------------------------------------------+ ``
i think it might be because of torch7 being by default for cudnn r5 ?!
i had to run git clone https://github.com/soumith/cudnn.torch.git -b R7 && cd cudnn.torch && luarocks make cudnn-scm-1.rockspec
to get cudnn7 recognized by torch. and had to re-do luarocks install cunn
and luarocks install cutorch
after that, but now get this same "invalid device ordinal" error.
maybe it's having some sort of version mismatch of cudnn cunn and cutorch? don't know where the cunn.torch and cutorch.torch compliant with cudnn.torch R7 might be located. anyone has any clue?
i'm not used to ubuntu and lua :S
found https://github.com/torch/cutorch/issues and yeah, doesn't look like they support cuda 9 yet, that's probably the issue here i think. :/ if anyone has any other insights beyond "try downgrading", i'd appreciate the input.