neuraltalk2 icon indicating copy to clipboard operation
neuraltalk2 copied to clipboard

THCTensorMathPointwise.cu line=40 error=8 : invalid device function

Open bbhushan-ds opened this issue 8 years ago • 7 comments

while running the following command I am getting error. And not able to run the NeuralTalk2 demo.

th eval.lua -model ~/model_id1-501-1448236541.t7 -image_folder ~/iot/images/ -num_images 10

DataLoaderRaw loading images from folder: /home/.../iot/images/

listing all images in directory /home/.../iot/images/

DataLoaderRaw found 10 images

constructing clones inside the LanguageModel

THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-5816/cutorch/lib/THC/generic/THCTensorMathPointwise.cu line=40 error=8 : invalid device function /home/ptcuser/torch/install/bin/luajit: ./misc/net_utils.lua:75: cuda runtime error (8) : invalid device function at /tmp/luarocks_cutorch-scm-1-5816/cutorch/lib/THC/generic/THCTensorMathPointwise.cu:40 stack traceback: [C]: in function 'add' ./misc/net_utils.lua:75: in function 'prepro' eval.lua:117: in function 'eval_split' eval.lua:173: in main chunk [C]: in function 'dofile' ...user/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00406670

Thanks in advance!!

bbhushan-ds avatar Aug 03 '16 06:08 bbhushan-ds

Hi, I have the same issue. Is it related to Torch installation, or rather the GPU compute capability?

jeiranj avatar Sep 16 '16 22:09 jeiranj

I'm also having a very similar problem of invalid device function coming up -- it seems like when I disable the use of cuda() and CudaTensors, some of the problems are going away, but I'm not sure what a more viable fix is, since I need to be using CUDA eventually.

suryabhupa avatar Sep 28 '16 19:09 suryabhupa

I have a similar problem THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-5062/cutorch/lib/THC/THCTensorCopy.cu line=205 error=8 : invalid device function /home/nady/newriver/torch/install/bin/luajit: ./misc/net_utils.lua:31: cuda runtime error (8) : invalid device function at /tmp/luarocks_cutorch-scm-1-5062/cutorch/lib/THC/THCTensorCopy.cu:205 stack traceback: [C]: in function 'copy' ./misc/net_utils.lua:31: in function 'build_cnn' train.lua:122: in main chunk

aicentral avatar Nov 09 '16 17:11 aicentral

You should definitely check your NVIDIA drivers and compatibility of your GPUs with your code. Make sure all the right dependencies, i.e. CUDA, etc. are all install properly. Also, it may help to reinstall Torch.

suryabhupa avatar Nov 09 '16 18:11 suryabhupa

Did this ever find a resolution?

I've got 2 GPUs in my machine. One is a GTX 980ti and the other is a newer GTX 1070

If I run it on the 980ti it works, but on the GTX 1070 I get the:

THCTensorMathPointwise.cu error=8 : invalid device function

I'm on Ubuntu 16.04 with CUDA 8 and cudnn 5.1. and driver version 367.57

filmo avatar Jan 15 '17 00:01 filmo

So for anybody who comes across this. The solution was to re-install cutorch and cunn.

It seems that luarocks lazy-compiles only for those cards found in the machine at the time Torch is installed. In my case, I had originally installed Torch with just my GTX 980ti card. (which is 5.2 compute compatible.)

The GTX 1070 (and probably 1060 & 1080) are 6.2 compute capable.

The problem went away after I did:

luarocks install cutorch
luarocks install cunn

This recompiles them them with the appropriate compute capabilities.

Here's some of the verbaige from the compile. Notices how it is 'autodetecting' the various card capabilities.

~~~ snip ~~~
-- MAGMA not found. Compiling without MAGMA support
-- Autodetected CUDA architecture(s): 6.1 3.5 5.2
~~~ snip ~~~

See here for more details

filmo avatar Jan 15 '17 00:01 filmo

@filmo Thanks. I ran into same issue when I move my container beween VMs. reinstall works. You may run into compiler error during reinstall in that case you need to reinstall torch first. luarocks install torch

hli000 avatar May 03 '17 14:05 hli000