blocks-examples icon indicating copy to clipboard operation
blocks-examples copied to clipboard

Machine translation crash: memory issue

Open afarajian opened this issue 8 years ago • 7 comments

Hi,

I am trying to run the machine translation example, but I face the memory allocation error: ImportError: ('The following error happened while compiling the node, Elemwise{Composite{Switch(i0, i1, Switch(AND(LT((i2 + i3), i1), GT(i4, i1)), i5, minimum((i2 + i3), i6)))}}(Elemwise{le,no_inplace}.0, TensorConstant{0}, TensorConstant{-1}, Elemwise{Composite{Switch(LT(Composite{Switch(LT(i0, i1), i1, i0)}(Composite{Switch(LT(i0, i1), i2, i0)}(Composite{(i0 - Switch(LT(i1, i2), i2, i1))}(i0, Composite{(i0 - Switch(GE(i1, i2), i2, i1))}(i1, Composite{Switch(LT(i0, i1), i2, i0)}(Composite{Switch(LT(i0, i1), (i0 + i2), i0)}(Composite{Switch(i0, i1, Switch(AND(LT(i2, i1), GT(i3, i1)), i4, maximum(i5, i2)))}(i2, i3, (i4 - i5), i5, i6, i7), i3, i8), i3, i9), i8), i3), i3, i1), i3), i10), Composite{Switch(LT(i0, i1), i1, i0)}(Composite{Switch(LT(i0, i1), i2, i0)}(Composite{(i0 - Switch(LT(i1, i2), i2, i1))}(i0, Composite{(i0 - Switch(GE(i1, i2), i2, i1))}(i1, Composite{Switch(LT(i0, i1), i2, i0)}(Composite{Switch(LT(i0, i1), (i0 + i2), i0)}(Composite{Switch(i0, i1, Switch(AND(LT(i2, i1), GT(i3, i1)), i4, maximum(i5, i2)))}(i2, i3, (i4 - i5), i5, i6, i7), i3, i8), i3, i9), i8), i3), i3, i1), i3), i10)}}.0, Elemwise{sub,no_inplace}.0, Elemwise{sub,no_inplace}.0, Elemwise{switch,no_inplace}.0), \n, /hltsrv0/farajian/.theano/compiledir_Linux-3.10-el7.x86_64-x86_64-with-redhat-7.2-Nitrogen-x86_64-2.7.11-64/tmpPi8duk/94aaeae5119dfd3722a2721c3fce5069.so: failed to map segment from shared object: Cannot allocate memory\n\nOriginal exception:\n\tImportError: The following error happened while compiling the node, Elemwise{Composite{Switch(i0, i1, Switch(AND(LT((i2 + i3), i1), GT(i4, i1)), i5, minimum((i2 + i3), i6)))}}(Elemwise{le,no_inplace}.0, TensorConstant{0}, TensorConstant{-1}, Elemwise{Composite{Switch(LT(Composite{Switch(LT(i0, i1), i1, i0)}(Composite{Switch(LT(i0, i1), i2, i0)}(Composite{(i0 - Switch(LT(i1, i2), i2, i1))}(i0, Composite{(i0 - Switch(GE(i1, i2), i2, i1))}(i1, Composite{Switch(LT(i0, i1), i2, i0)}(Composite{Switch(LT(i0, i1), (i0 + i2), i0)}(Composite{Switch(i0, i1, Switch(AND(LT(i2, i1), GT(i3, i1)), i4, maximum(i5, i2)))}(i2, i3, (i4 - i5), i5, i6, i7), i3, i8), i3, i9), i8), i3), i3, i1), i3), i10), Composite{Switch(LT(i0, i1), i1, i0)}(Composite{Switch(LT(i0, i1), i2, i0)}(Composite{(i0 - Switch(LT(i1, i2), i2, i1))}(i0, Composite{(i0 - Switch(GE(i1, i2), i2, i1))}(i1, Composite{Switch(LT(i0, i1), i2, i0)}(Composite{Switch(LT(i0, i1), (i0 + i2), i0)}(Composite{Switch(i0, i1, Switch(AND(LT(i2, i1), GT(i3, i1)), i4, maximum(i5, i2)))}(i2, i3, (i4 - i5), i5, i6, i7), i3, i8), i3, i9), i8), i3), i3, i1), i3), i10)}}.0, Elemwise{sub,no_inplace}.0, Elemwise{sub,no_inplace}.0, Elemwise{switch,no_inplace}.0), \n, /hltsrv0/farajian/.theano/compiledir_Linux-3.10-el7.x86_64-x86_64-with-redhat-7.2-Nitrogen-x86_64-2.7.11-64/tmpPi8duk/94aaeae5119dfd3722a2721c3fce5069.so: failed to map segment from shared object: Cannot allocate memory', Elemwise{Composite{Switch(i0, i1, Switch(AND(LT((i2 + i3), i1), GT(i4, i1)), i5, minimum((i2 + i3), i6)))}}(Elemwise{le,no_inplace}.0, TensorConstant{0}, TensorConstant{-1}, Elemwise{Composite{Switch(LT(Composite{Switch(LT(i0, i1), i1, i0)}(Composite{Switch(LT(i0, i1), i2, i0)}(Composite{(i0 - Switch(LT(i1, i2), i2, i1))}(i0, Composite{(i0 - Switch(GE(i1, i2), i2, i1))}(i1, Composite{Switch(LT(i0, i1), i2, i0)}(Composite{Switch(LT(i0, i1), (i0 + i2), i0)}(Composite{Switch(i0, i1, Switch(AND(LT(i2, i1), GT(i3, i1)), i4, maximum(i5, i2)))}(i2, i3, (i4 - i5), i5, i6, i7), i3, i8), i3, i9), i8), i3), i3, i1), i3), i10), Composite{Switch(LT(i0, i1), i1, i0)}(Composite{Switch(LT(i0, i1), i2, i0)}(Composite{(i0 - Switch(LT(i1, i2), i2, i1))}(i0, Composite{(i0 - Switch(GE(i1, i2), i2, i1))}(i1, Composite{Switch(LT(i0, i1), i2, i0)}(Composite{Switch(LT(i0, i1), (i0 + i2), i0)}(Composite{Switch(i0, i1, Switch(AND(LT(i2, i1), GT(i3, i1)), i4, maximum(i5, i2)))}(i2, i3, (i4 - i5), i5, i6, i7), i3, i8), i3, i9), i8), i3), i3, i1), i3), i10)}}.0, Elemwise{sub,no_inplace}.0, Elemwise{sub,no_inplace}.0, Elemwise{switch,no_inplace}.0), '\n', '/hltsrv0/farajian/.theano/compiledir_Linux-3.10-el7.x86_64-x86_64-with-redhat-7.2-Nitrogen-x86_64-2.7.11-64/tmpPi8duk/94aaeae5119dfd3722a2721c3fce5069.so: failed to map segment from shared object: Cannot allocate memory')

I tried with different settings, but no success. I also checked the maximum memory which was used in running the script and realized that it was less than 1GB (Max vmem = 830.188M). Any idea about this problem and how it can be solved?

afarajian avatar Apr 11 '16 16:04 afarajian

Do you have a limit on the process? Because that is probably the only thing that could cause a mmap to fail like that.

You can check your current limits with "ulimit -a"

2016-04-11 12:17 GMT-04:00 PersianNLPer [email protected]:

Hi,

I am trying to run the machine translation example, but I face the memory allocation error: ImportError: ('The following error happened while compiling the node, Elemwise{Composite{Switch(i0, i1, Switch(AND(LT((i2 + i3), i1), GT(i4, i1)), i5, minimum((i2 + i3), i6)))}}(Elemwise{le,no_inplace}.0, TensorConstant{0}, TensorConstant{-1}, Elemwise{Composite{Switch(LT(Composite{Switch(LT(i0, i1), i1, i0)}(Composite{Switch(LT(i0, i1), i2, i0)}(Composite{(i0 - Switch(LT(i1, i2), i2, i1))}(i0, Composite{(i0 - Switch(GE(i1, i2), i2, i1))}(i1, Composite{Switch(LT(i0, i1), i2, i0)}(Composite{Switch(LT(i0, i1), (i0 + i2), i0)}(Composite{Switch(i0, i1, Switch(AND(LT(i2, i1), GT(i3, i1)), i4, maximum(i5, i2)))}(i2, i3, (i4 - i5), i5, i6, i7), i3, i8), i3, i9), i8), i3), i3, i1), i3), i10), Composite{Switch(LT(i0, i1), i1, i0)}(Composite{Switch(LT(i0, i1), i2, i0)}(Composite{(i0 - Switch(LT(i1, i2), i2, i1))}(i0, Composite{(i0 - Switch(GE(i1, i2), i2, i1))}(i1, Composite{Switch(LT(i0, i1), i2, i0)}(Composite{Switch(LT(i0, i1), (i0 + i2), i0)}(Composite{Switch(i0, i1, Switch(A ND(LT(i2 , i1), GT(i3, i1)), i4, maximum(i5, i2)))}(i2, i3, (i4 - i5), i5, i6, i7), i3, i8), i3, i9), i8), i3), i3, i1), i3), i10)}}.0, Elemwise{sub,no_inplace}.0, Elemwise{sub,no_inplace}.0, Elemwise{switch,no_inplace}.0), \n, /hltsrv0/farajian/.theano/compiledir_Linux-3.10-el7.x86_64-x86_64-with-redhat-7.2-Nitrogen-x86_64-2.7.11-64/tmpPi8duk/94aaeae5119dfd3722a2721c3fce5069.so: failed to map segment from shared object: Cannot allocate memory\n\nOriginal exception:\n\tImportError: The following error happened while compiling the node, Elemwise{Composite{Switch(i0, i1, Switch(AND(LT((i2 + i3), i1), GT(i4, i1)), i5, minimum((i2 + i3), i6)))}}(Elemwise{le,no_inplace}.0, TensorConstant{0}, TensorConstant{-1}, Elemwise{Composite{Switch(LT(Composite{Switch(LT(i0, i1), i1, i0)}(Composite{Switch(LT(i0, i1), i2, i0)}(Composite{(i0 - Switch(LT(i1, i2), i2, i1))}(i0, Composite{(i0 - Switch(GE(i1, i2), i2, i1))}(i1, Composite{Switch(LT(i0, i1), i2, i0)}(Composite{Switch(LT(i0, i1), (i0 + i2), i 0)}(Comp osite{Switch(i0, i1, Switch(AND(LT(i2, i1), GT(i3, i1)), i4, maximum(i5, i2)))}(i2, i3, (i4 - i5), i5, i6, i7), i3, i8), i3, i9), i8), i3), i3, i1), i3), i10), Composite{Switch(LT(i0, i1), i1, i0)}(Composite{Switch(LT(i0, i1), i2, i0)}(Composite{(i0 - Switch(LT(i1, i2), i2, i1))}(i0, Composite{(i0 - Switch(GE(i1, i2), i2, i1))}(i1, Composite{Switch(LT(i0, i1), i2, i0)}(Composite{Switch(LT(i0, i1), (i0 + i2), i0)}(Composite{Switch(i0, i1, Switch(AND(LT(i2, i1), GT(i3, i1)), i4, maximum(i5, i2)))}(i2, i3, (i4 - i5), i5, i6, i7), i3, i8), i3, i9), i8), i3), i3, i1), i3), i10)}}.0, Elemwise{sub,no_inplace}.0, Elemwise{sub,no_inplace}.0, Elemwise{switch,no_inplace}.0), \n, /hltsrv0/farajian/.theano/compiledir_Linux-3.10-el7.x86_64-x86_64-with-redhat-7.2-Nitrogen-x86_64-2.7.11-64/tmpPi8duk/94aaeae5119dfd3722a2721c3fce5069.so: failed to map segment from shared object: Cannot allocate memory', Elemwise{Composite{Switch(i0, i1, Switch(AND(LT((i2 + i3), i1), GT(i4, i1)), i5, minimum(( i2 + i3) , i6)))}}(Elemwise{le,no_inplace}.0, TensorConstant{0}, TensorConstant{-1}, Elemwise{Composite{Switch(LT(Composite{Switch(LT(i0, i1), i1, i0)}(Composite{Switch(LT(i0, i1), i2, i0)}(Composite{(i0 - Switch(LT(i1, i2), i2, i1))}(i0, Composite{(i0 - Switch(GE(i1, i2), i2, i1))}(i1, Composite{Switch(LT(i0, i1), i2, i0)}(Composite{Switch(LT(i0, i1), (i0 + i2), i0)}(Composite{Switch(i0, i1, Switch(AND(LT(i2, i1), GT(i3, i1)), i4, maximum(i5, i2)))}(i2, i3, (i4 - i5), i5, i6, i7), i3, i8), i3, i9), i8), i3), i3, i1), i3), i10), Composite{Switch(LT(i0, i1), i1, i0)}(Composite{Switch(LT(i0, i1), i2, i0)}(Composite{(i0 - Switch(LT(i1, i2), i2, i1))}(i0, Composite{(i0 - Switch(GE(i1, i2), i2, i1))}(i1, Composite{Switch(LT(i0, i1), i2, i0)}(Composite{Switch(LT(i0, i1), (i0 + i2), i0)}(Composite{Switch(i0, i1, Switch(AND(LT(i2, i1), GT(i3, i1)), i4, maximum(i5, i2)))}(i2, i3, (i4 - i5), i5, i6, i7), i3, i8), i3, i9), i8), i3), i3, i1), i3), i10)}}.0, Elemwise{sub,no_inplace}.0, Elemwise{s ub,no_in place}.0, Elemwise{switch,no_inplace}.0), '\n', '/hltsrv0/farajian/.theano/compiledir_Linux-3.10-el7.x86_64-x86_64-with-redhat-7.2-Nitrogen-x86_64-2.7.11-64/tmpPi8duk/94aaeae5119dfd3722a2721c3fce5069.so: failed to map segment from shared object: Cannot allocate memory')

I tried with different settings, but no success. I also checked the maximum memory which was used in running the script and realized that it was less than 1GB (Max vmem = 830.188M). Any idea about this problem and how it can be solved?

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/mila-udem/blocks-examples/issues/93

abergeron avatar Apr 11 '16 16:04 abergeron

I am running my experiments on our cluster, and to submit the job I need to define the amount of memory that my process would need. So, I have used 10GB, 20GB, and 30GB so far, and nothing has changed. About the ulimit -a: the virtual memory size is exactly the same as the amount of memory that I asked for running the experiment. So, basically it is 10, 20, and 30GB in different settings. Here you can see the results of ulimit -a in one of the experiments: core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 514573 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) unlimited cpu time (seconds, -t) unlimited max user processes (-u) 514573 virtual memory (kbytes, -v) 20971520 file locks (-x) unlimited

afarajian avatar Apr 11 '16 16:04 afarajian

I can't really help anymore then. It works locally and there is no memory leak so the problem comes from the cluster configuration somehow. I will admit that I have no other ideas.

2016-04-11 12:29 GMT-04:00 PersianNLPer [email protected]:

I am running my experiments on our cluster, and to submit the job I need to define the amount of memory that my process would need. So, I have used 10GB, 20GB, and 30GB so far, and nothing has changed. About the ulimit -a: the virtual memory size is exactly the same as the amount of memory that I asked for running the experiment. So, basically it is 10, 20, and 30GB in different settings. Here you can see the results of ulimit -a in one of the experiments:

core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 514573 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) unlimited cpu time (seconds, -t) unlimited max user processes (-u) 514573 virtual memory (kbytes, -v) 20971520 file locks (-x) unlimited

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/mila-udem/blocks-examples/issues/93#issuecomment-208435306

abergeron avatar Apr 12 '16 03:04 abergeron

Thank you very much. Actually, the problem was related to the incompatibility of some of the libraries. And is solved now after I removed all the libraries and installed updated versions.

But, now I have another issue which I think is related to the memory again. Using vocabularies of size 80,000 and higher, the process crashes. I believe it is related to the fact that by increasing the vocabulary size the model size will increase as well, so, it is not able to keep all the model in the memory. But, is there any way around to be able to work with larger vocabularies?

Thanks.

afarajian avatar Apr 13 '16 09:04 afarajian

Can you give the error message?

In Theano doc, there is a section about speed memory thread off. Le 13 avr. 2016 05:35, "PersianNLPer" [email protected] a écrit :

Thank you very much. Actually, the problem was related to the incompatibility of some of the libraries. And is solved now after I removed all the libraries and installed updated versions.

But, now I have another issue which I think is related to the memory again. Using vocabularies of size 80,000 and higher, the process crashes. I believe it is related to the fact that by increasing the vocabulary size the model size will increase as well, so, it is not able to keep all the model in the memory. But, is there any way around to be able to work with larger vocabularies?

Thank you very much,

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/mila-udem/blocks-examples/issues/93#issuecomment-209336458

nouiz avatar Apr 13 '16 12:04 nouiz

Using large vocabularies in NMT is an open problem, there are some papers on that, see e.g. http://arxiv.org/abs/1412.2007

On 13 April 2016 at 08:37, Frédéric Bastien [email protected] wrote:

Can you give the error message?

In Theano doc, there is a section about speed memory thread off. Le 13 avr. 2016 05:35, "PersianNLPer" [email protected] a écrit :

Thank you very much. Actually, the problem was related to the incompatibility of some of the libraries. And is solved now after I removed all the libraries and installed updated versions.

But, now I have another issue which I think is related to the memory again. Using vocabularies of size 80,000 and higher, the process crashes. I believe it is related to the fact that by increasing the vocabulary size the model size will increase as well, so, it is not able to keep all the model in the memory. But, is there any way around to be able to work with larger vocabularies?

Thank you very much,

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub < https://github.com/mila-udem/blocks-examples/issues/93#issuecomment-209336458

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/mila-udem/blocks-examples/issues/93#issuecomment-209409283

rizar avatar Apr 13 '16 17:04 rizar

@rizar: Thank you for the paper. @nouiz: sorry for my late reply. actually I am now busy with something else, so for now I am running the experiments using the 30K vocabulary. But, I will try larger vocabularies next week and will send you the errors then.

afarajian avatar Apr 17 '16 17:04 afarajian