char-rnn
char-rnn copied to clipboard
memory problems and frequent crashes
When I try to train any network larger than the default size on GPU, I pretty quickly run into the error: cuda runtime error (77) : an illegal memory access was encountered at /tmp/luarocks_cutorch-scm-1-6753/cutorch/lib/THC/generic/THCStorage.c:147 I know I'm not running out of memory, so I assume this is some kind of segmentation fault.
It seems to work fine on the CPU, which implies that the problem is with cutorch (as the error message suggests), but since I'm doing this all on my personal computer and CPU training is an order of magnitude slower, I'd like to get GPU training working again.
Interestingly, after reinstalling everything, this now only occurs when training on my second GPU (-gpuid 1) but not on GPU 0, which is frusturating, because GPU 1 is a little bit faster. Better than not working at all though.