LM-LSTM-CRF
LM-LSTM-CRF copied to clipboard
Asking for the cuda OOM questions
I run the code on a Chinese ner train data(around 70 thousand sentences, and I set the LM-LSTM-crf to co-train model), and I got the OMM error:
When I set the batch_size to 10, it results in:
- Tot it 6916 (epoch 0): 6308it [26:09, 4.02it/s]THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "train_wc.py", line 243, in
loss.backward() File "/usr/local/lib/python3.5/site-packages/torch/autograd/variable.py", line 167, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables) File "/usr/local/lib/python3.5/site-packages/torch/autograd/init.py", line 99, in backward variables, grad_variables, retain_graph) RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58
When I set the batch_size to 128, it results in:
- Tot it 543 (epoch 0): 455it [03:57, 1.91it/s]THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "train_wc.py", line 241, in
Could any one give me some advise to solve it?
Hi, what type of GPU you are using, and how large is its memory?
For chinese, even the character-level language modeling would result in a large dictionary (and also large GPU memory consumptions). One way to alleviate this problem is to filter some low-frequency words as unknown tokens.
The type of GPU is Tesla K40c, We have 4 piece and each has 10 Memory. Both of using only one GPU or set it to multi-GPU in the pytorch code have the same OOM error. And set mini_count to 5 even 10 also doesn't work. But if I do not use the co_train, it works well~
Yes, language modeling for chinese is a little tricky. I think it's necessary to do some model modification to make it work.