word-rnn icon indicating copy to clipboard operation
word-rnn copied to clipboard

Unable to use with OpenCL

Open albanlv opened this issue 9 years ago • 10 comments

Unable to make it work with OpenCL. It seems you need to install fbcunn, which implies having installed CUDA, and having an NVIDIA GPU.

I tried to use it with OpenCL but I get error messages : "No Luarocks module found for fbcunn" and after installing fbcunn: "no CUDA-capable device is detected"

which is normal as I don't have one, but annoying as I was using the option "-opencl 1" .

albanlv avatar Feb 03 '16 14:02 albanlv

Apologies for not getting back to you sooner (I somehow stopped getting emails for issues).

You can try again now, with the latest code, as the fbcunn dependency has been removed.

There might still be dependencies on CUDA in there, because I'm lazy, but as far as I know there's nothing in there that couldn't run with OpenCL in principle. But you might have to replace some torch.CudaTensor()s with appropriate OpenCL versions.

larspars avatar Feb 28 '16 12:02 larspars

@larspars Hey, I'm also having issues running with openCL

This is the output I get when trying to run the basic th train.lua -opencl 1

libthclnn_searchpath    /Users/chris/torch/install/lib/lua/5.1/libTHCLNN.so
using OpenCL on GPU 0...
/Users/chris/torch/install/bin/luajit: /Users/chris/torch/install/share/lua/5.1/trepl/init.lua:384: ./util/SharedDropout.lua:3: attempt to call field 'CudaTensor' (a nil value)
stack traceback:
    [C]: in function 'error'
    /Users/chris/torch/install/share/lua/5.1/trepl/init.lua:384: in function 'require'
    train.lua:119: in main chunk
    [C]: in function 'dofile'
    ...hris/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x010701fbd0

I see you mentioned replacing CudaTensors, which seems relevant here. Any chance you can give me a little more insight on how that would be done?

allthetime avatar Mar 03 '16 04:03 allthetime

@larspars Oh, figured it out,

had to change line 3 in .../word-rnn/util/SharedDropout.lua

where it says torch.CudaTensor I simply changed it to torch.Tensor and now it's good.

perhaps the -opencl 1 option should trigger this change?

allthetime avatar Mar 03 '16 04:03 allthetime

That would work, but will run the tensor on the CPU. You should get better performance if you change it with torch.ClTensor

larspars avatar Mar 03 '16 16:03 larspars

@larspars Oh thanks. I made a pull request, let me know what you think.

allthetime avatar Mar 03 '16 23:03 allthetime

Looks like ClTensor is misspelled, should have capital C :)

larspars avatar Aug 02 '17 22:08 larspars

Hi, Lars,

Eventually, for me, it works with torch.ClTensor.

Many thanks.

lxyangAI avatar Aug 09 '17 20:08 lxyangAI

Hi, Lars.

FYI, I run your programs successfully with GPU on Mac, to train and sample.

To test the result with other text in general, such as news article with hundreds of words, how should I prepare the data in order to use ? The current data file *.t7 is binary and has its own format.

Thank you & Best regards.

lxyangAI avatar Aug 15 '17 21:08 lxyangAI

If you want to train on different text, give the parameter -data_dir and point to a folder that has a file "input.txt". With -word_level 1, this file will be tokenized by splitting on whitespace. You might want to convert the text to lowercase first.

If you want to load a pretrained model into your own code, you can use torch.load("checkpoint_filename.t7")

larspars avatar Aug 16 '17 07:08 larspars

It looks like the required edits for openCL have now changed slightly. There are now two needed:

In SharedDropout.lua line 50: change SharedDropout_noise[id] = torch.CudaTensor() to SharedDropout_noise[id] = torch.CLTensor()

and

In StochasticSkip.lua: comment out lines 2 and 3:

require 'cudnn'
require 'cunn'

This will get rid of the error

module 'cunn' not found:No LuaRocks module found for cunn
	no field package.preload['cunn']

However, I'm not sure if this is enough to fix the problem entirely, because I then run into the issue here: https://github.com/larspars/word-rnn/issues/24

janelleshane avatar Oct 08 '17 02:10 janelleshane