GRU4Rec
GRU4Rec copied to clipboard
theano error
I run this command:
$python run.py /path/to/training_data_file -t /path/to/test_data_file -m 1 5 10 20 -ps loss=bpr-max,final_act=elu-.5,hidden_act=tanh,layers=100,adapt=adagrad,n_epochs=10,batch_size=32,dropout_p_embed=0.0,dropout_p_hidden=0.0,learning_rate=0.2,momentum=0.3,n_sample=2048,sample_alpha=0.0,bpreg=1.0,constrained_embedding=False
But I get this error:
/home/.../anaconda2/envs/gru4rec/lib/python3.6/site-packages/theano/gpuarray/dnn.py:184: UserWarning: Your cuDNN version is more recent than Theano. If you encounter problems, try updating Theano or downgrading cuDNN to a version >= v5 and <= v7. warnings.warn("Your cuDNN version is more recent than " ERROR (theano.gpuarray): Could not initialize pygpu, support disabled Traceback (most recent call last): File "/home/.../anaconda2/envs/gru4rec/lib/python3.6/site-packages/theano/gpuarray/_init _.py", line 227, in
use(config.device) File "/home/.../anaconda2/envs/gru4rec/lib/python3.6/site-packages/theano/gpuarray/_init _.py", line 214, in use init_dev(device, preallocate=preallocate) File "/home/.../anaconda2/envs/gru4rec/lib/python3.6/site-packages/theano/gpuarray/_init _.py", line 117, in init_dev context.cudnn_handle = dnn._make_handle(context) File "/home/.../anaconda2/envs/gru4rec/lib/python3.6/site-packages/theano/gpuarray/dnn.py" , line 130, in _make_handle "This can be a sign of a too old driver.", err) RuntimeError: ('Error creating cudnn handle. This can be a sign of a too old driver.', 1) SET loss TO bpr-max (type: <class 'str'>) SET final_act TO elu-0.5 (type: <class 'str'>) SET hidden_act TO tanh (type: <class 'str'>) SET layers TO [100] (type: <class 'list'>) SET adapt TO adagrad (type: <class 'str'>) SET n_epochs TO 10 (type: <class 'int'>) SET batch_size TO 32 (type: <class 'int'>) SET dropout_p_embed TO 0.0 (type: <class 'float'>) SET dropout_p_hidden TO 0.0 (type: <class 'float'>) SET learning_rate TO 0.2 (type: <class 'float'>) SET momentum TO 0.3 (type: <class 'float'>) SET n_sample TO 2048 (type: <class 'int'>) SET sample_alpha TO 0.0 (type: <class 'float'>) SET bpreg TO 1.0 (type: <class 'float'>) SET constrained_embedding TO False (type: <class 'bool'>) Loading training data... Loading data from TAB separated file: examples/rsc15/processed/rsc15_train_tr.txt Started training The dataframe is not sorted by SessionId, sorting now Data is sorted in 46.12 Traceback (most recent call last): File "run.py", line 109, in
gru.fit(data, sample_store=args.sample_store_size, store_type='gpu') File "/home/../GRU4Rec/gru4rec.py", line 556, in fit generate_samples = theano.function([], updates=updates_st) File "/home/../anaconda2/envs/gru4rec/lib/python3.6/site-packages/theano/compile/function .py", line 317, in function output_keys=output_keys) File "/home/../anaconda2/envs/gru4rec/lib/python3.6/site-packages/theano/compile/pfunc.py ", line 486, in pfunc output_keys=output_keys) File "/home/../anaconda2/envs/gru4rec/lib/python3.6/site-packages/theano/compile/function _module.py", line 1841, in orig_function fn = m.create(defaults) File "/home/../anaconda2/envs/gru4rec/lib/python3.6/site-packages/theano/compile/function _module.py", line 1715, in create input_storage=input_storage_lists, storage_map=storage_map) File "/home/../anaconda2/envs/gru4rec/lib/python3.6/site-packages/theano/gof/link.py", li ne 699, in make_thunk storage_map=storage_map)[:3] File "/home/../anaconda2/envs/gru4rec/lib/python3.6/site-packages/theano/gof/vm.py", line 1091, in make_all impl=impl)) File "/home/../anaconda2/envs/gru4rec/lib/python3.6/site-packages/theano/gof/op.py", line 955, in make_thunk no_recycling) File "/home/../anaconda2/envs/gru4rec/lib/python3.6/site-packages/theano/gof/op.py", line 858, in make_c_thunk output_storage=node_output_storage) File "/home/../anaconda2/envs/gru4rec/lib/python3.6/site-packages/theano/gof/cc.py", line 1217, in make_thunk keep_lock=keep_lock) File "/home/../anaconda2/envs/gru4rec/lib/python3.6/site-packages/theano/gof/cc.py", line 1157, in compile keep_lock=keep_lock) File "/home/../anaconda2/envs/gru4rec/lib/python3.6/site-packages/theano/gof/cc.py", line 1641, in cthunk_factory *(in_storage + out_storage + orphd)) RuntimeError: ('The following error happened while compiling the node', GpuBinarySearchSorted{context_name=None, dtype_int64=True}(GpuFromHost<None>.0, GpuFromHost<None>.0), '\n', 'GpuKernel_init error 3: nvrtcCompileProgram: NVRTC_ERROR_BUILTIN_OPERATION_FAILURE')
OS: Debian 4.9.110-3+deb9u4~deb8u1 (2018-08-24) x86_64 GNU/Linux cudnn: 7.6 cuda: 9.2 theano: 1.0.4 pygpu: 0.7.6 libgpuarray: 0.7.6
Anyone could help?
This is not a GRU4Rec related error, but a sign that something in your Theano setup is not correct (it's not even the fault of Theano, but there is some kind of incompatibility between the driver/cuda/cuDNN on your system).
Some ideas on what could have gone wrong:
- Maybe your nVidia driver is too old (as the error message suggests). You should have at least r396.26, but it also works with newer (for example I use r430.14). If your driver is old, try upgrading it. (When upgrading the driver pay attention to the runfile vs. package issue.)
- Maybe the cuDNN you use was built for a different cuda version (there are different cuDNN 7.6 libs built for cuda 9.2, cuda 10.0, cuda 10.2, etc). Double check that your cuda and cuDNN match.
- Maybe when you built libgpuarray/pygpu, you did it in a different environment and it links to a different cuda/cuDNN version than you are using currently.
- It is less likely, but maybe a different cuda version is in your
PATH
or inLD_LIBRARY_PATH
or inCUDA_HOME
(or these environment variables don't point to any cuda version). - You can also try disabling cuDNN usage in Theano by running it with the appropriate flag:
THEANO_FLAGS=dnn.enabled=False python run.py ...
. I don't think that this has a significant impact on the speed of GRU4Rec, but I haven't tried yet. If setting this flag helps, the issue is definitely some kind of incompatibility between your cuDNN and cuda (and/or nVidia driver). - You can also try to delete the theano cache (
theano-cache purge
or manually deleting the contents of~/.theano/
), maybe something is stuck there from a previous version.
If nothing else works, you can set up the whole environment from the ground up. I usually do it this way, because then I know exactly what was installed. It has worked for me 100% of the time. The main steps are:
- Uninstall your current cuda and nVidia driver (if these are installed)
- Install the nVidia driver (>=396.26)
- Install cuda 9.2
- Set the
CUDA_HOME
environment variable to<cuda_install_dir>
(default:/usr/local/cuda-9.2
), and make sure that<cuda_install_dir>/bin
and<cuda_install_dir>/lib64
are added toPATH
andLD_LIBRARY_PATH
respectively. - Download the appropriate cuDNN libs and copy them under
cuda/lib64
- (If you start on a new machine, this is the point where you also install python and the required packages (optionally in pyenv/virtualenv))
- Install pycuda
- Build and install libgpuarray & pygpu
- Build and install Theano
Thank you very much. I will check the problem according to your comments.