lua---nnx
lua---nnx copied to clipboard
SoftMaxTree...
I'm seeing:
/usr/local/share/lua/5.1/nnx/SoftMaxTree.lua:171: attempt to call field 'SoftMaxTree_updateOutput' (a nil value)
stack traceback:
/usr/local/share/lua/5.1/nnx/SoftMaxTree.lua:171: in function 'func'
/usr/local/share/lua/5.1/nngraph/gmodule.lua:252: in function 'neteval'
/usr/local/share/lua/5.1/nngraph/gmodule.lua:287: in function 'forward'
runtrain.lua:194: in function 'opfunc'
I have a suspicion it is related to cunnx
declining to build:
/tmp/luarocks_cunnx-scm-1-1265/cunnx/SoftMaxTree.cu(439): error: argument of type "THCudaTensor *" is incompatible with parameter of type "THCudaIntTensor *"
/tmp/luarocks_cunnx-scm-1-1265/cunnx/BlockSparse.cu(99): error: argument of type "THCudaTensor *" is incompatible with parameter of type "THCudaLongTensor *"
/tmp/luarocks_cunnx-scm-1-1265/cunnx/BlockSparse.cu(100): error: argument of type "THCudaTensor *" is incompatible with parameter of type "THCudaLongTensor *"
/tmp/luarocks_cunnx-scm-1-1265/cunnx/WindowGate.cu(110): error: argument of type "THCudaTensor *" is incompatible with parameter of type "THCudaLongTensor *"
/tmp/luarocks_cunnx-scm-1-1265/cunnx/WindowGate2.cu(120): error: argument of type "THCudaTensor *" is incompatible with parameter of type "THCudaLongTensor *"
5 errors detected in the compilation of "/tmp/tmpxft_0000ff8c_00000000-7_init.cpp1.ii".
CMake Error at cunnx_generated_init.cu.o.cmake:262 (message):
Error generating file
/tmp/luarocks_cunnx-scm-1-1265/cunnx/build/CMakeFiles/cunnx.dir//./cunnx_generated_init.cu.o
CMakeFiles/cunnx.dir/build.make:63: recipe for target 'CMakeFiles/cunnx.dir/cunnx_generated_init.cu.o' failed
make[2]: *** [CMakeFiles/cunnx.dir/cunnx_generated_init.cu.o] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/cunnx.dir/all' failed
make[1]: *** [CMakeFiles/cunnx.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2
Error: Build error: Failed building.
Could this be related to Mac OS X? I noticed that the nnx
build script wants to generate .so's but not dylibs.
fixed by https://github.com/nicholas-leonard/cunnx/pull/23 .
Yes - fix confirmed, thanks!
I spoke too soon. While it compiles now, it seems like there may be a memory leak. Varying batch size and whether to use L2 regularization, I get cuda out of memory before the 2d, 3d, or 4th epoch.
@elbamos do you think it has anything to do with this : https://github.com/nicholas-leonard/cunnx/commit/9ebc12ba9e287efcfe08b877156780c090f5befd ? my cuda is rusty.
I don't know - when it comes to cuda, I'm very much a user not a programmer, I'm afraid. I know just enough to say that if the amount of free GPU ram after epoch 2 is substantially less than after epoch 1, and nothing in the code changed other than to support the move to Tree, and dumping l2 regularization makes it last longer (because the parameters would have to be copied to be multiplied), that a memory leak is a suspect.
On Jan 12, 2016, at 6:03 PM, Nicholas Léonard [email protected] wrote:
@elbamos do you think it has anything to do with this : nicholas-leonard/cunnx@9ebc12b ? my cuda is rusty.
— Reply to this email directly or view it on GitHub.
@elbamos It should be fixed with the newest commits. Please reinstall nnx and cunnx.
@nicholas-leonard
'fraid not :( Installing the latest version, I get exactly the same result as before.
@elbamos What script are you running? I could try to reproduce on my end.
@nicholas-leonard Do you need the net design or the whole training script & data? How should I get it to you?
@elbamos Whole training script and data. You can share your repository with me or send it to me via email.
Has there been any update to this please? I installed the newer version of nnx and still getting the same above error.
@abhisheksgumadi The unit test and compilations pass on my end (using Ubuntu 14). Could you be more explicit about your issue?