skip-thoughts icon indicating copy to clipboard operation
skip-thoughts copied to clipboard

pygpu.gpuarray.GpuArrayException: out of memory on small corpus

Open kuhanw opened this issue 7 years ago • 1 comments

Dear experts,

I have encountered a memory issue while attempting to train a 1 million sentence corpus using the skip thought model. In the paper, the model was trained on well in excess of 1 million sentences. From Table 1 of the paper, https://arxiv.org/pdf/1506.06726.pdf, it looks like the training set consisted of 74 million sentences.

If this is really a GPU memory limitation, how was the model in the paper trained, and on what sort of hardware specifications?

I am currently working off a AWS instance with a single Tesla K80 GPU with 12 GB of memory. The memory error is displayed below.

Thank You,

Kuhan

Traceback (most recent call last): File "training_notes.py", line 25, in 'adam', 64, model_name, vocab_name, 10, False) File "/home/ec2-user/py27_version/skip-thoughts/training/train.py", line 119, in trainer f_grad_shared, f_update = eval(optimizer)(lr, tparams, grads, inps, cost) File "/home/ec2-user/py27_version/skip-thoughts/training/optim.py", line 31, in adam v = theano.shared(p.get_value() * 0.) File "/home/ec2-user/anaconda3/envs/py27/lib/python2.7/site-packages/theano/compile/sharedvalue.py", line 268, in shared allow_downcast=allow_downcast, **kwargs) File "/home/ec2-user/anaconda3/envs/py27/lib/python2.7/site-packages/theano/gpuarray/type.py", line 669, in gpuarray_shared_constructor context=type.context) File "pygpu/gpuarray.pyx", line 915, in pygpu.gpuarray.array (pygpu/gpuarray.c:12223) File "pygpu/gpuarray.pyx", line 970, in pygpu.gpuarray.carray (pygpu/gpuarray.c:13105) File "pygpu/gpuarray.pyx", line 664, in pygpu.gpuarray.pygpu_fromhostdata (pygpu/gpuarray.c:9847) File "pygpu/gpuarray.pyx", line 301, in pygpu.gpuarray.array_copy_from_host (pygpu/gpuarray.c:5813) pygpu.gpuarray.GpuArrayException: out of memory

kuhanw avatar Apr 08 '17 21:04 kuhanw

I also am having this same issue with the NC and NV VMs in Azure. I was able to run this with ~20k lines, but any more resulted in GPU out of memory errors.

If I turn off Theano-GPU, it appears to load and run the >100k lines fine. This appears to run much slower with GPU off, but at least that's an option.

HappyCoderMan avatar Oct 08 '17 04:10 HappyCoderMan