sdvae
sdvae copied to clipboard
out of memory
@Hanjun-Dai Thnk you so much loading your code into github.
I followed your instruction from 1 to 5 skipped 4. Everything is going well. However, when I try ./run_sample_prior.sh and ./run_valid_prior.sh, I got error messages like this.
xuzhang@xuzhang1:/media/projects/sdvae/mol_vae/pytorch_eval$ ./run_sample_prior.sh
save_dir for use is ../../dropbox/results/zinc
using vae
a Conv1d inited
a Conv1d inited
a Conv1d inited
a Linear inited
a Linear inited
a Linear inited
a Linear inited
/media/projects/sdvae/mol_vae/pytorch_eval/../mol_common/pytorch_initializer.py:36: UserWarning: nn.init.orthogonal is now deprecated in favor of nn.init.orthogonal_.
nn.init.orthogonal(x0)
/media/projects/sdvae/mol_vae/pytorch_eval/../mol_common/pytorch_initializer.py:37: UserWarning: nn.init.orthogonal is now deprecated in favor of nn.init.orthogonal_.
nn.init.orthogonal(x1)
/media/projects/sdvae/mol_vae/pytorch_eval/../mol_common/pytorch_initializer.py:38: UserWarning: nn.init.orthogonal is now deprecated in favor of nn.init.orthogonal_.
nn.init.orthogonal(x2)
a GRU inited
a Linear inited
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524580978845/work/aten/src/THC/THCTensorRandom.cu line=25 error=2 : out of memory
Traceback (most recent call last):
File "sample_prior.py", line 65, in
/home/xuzhang/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py(249)
() -> return self.apply(lambda t: t.cuda(device)) (Pdb) [7]+ Stopped ./run_sample_prior.sh xuzhang@xuzhang1:/media/projects/sdvae/mol_vae/pytorch_eval$ ./run_valid_prior.sh save_dir for use is ../../dropbox/results/zinc using vae a Conv1d inited a Conv1d inited a Conv1d inited a Linear inited a Linear inited a Linear inited a Linear inited /media/projects/sdvae/mol_vae/pytorch_eval/../mol_common/pytorch_initializer.py:36: UserWarning: nn.init.orthogonal is now deprecated in favor of nn.init.orthogonal. nn.init.orthogonal(x0) /media/projects/sdvae/mol_vae/pytorch_eval/../mol_common/pytorch_initializer.py:37: UserWarning: nn.init.orthogonal is now deprecated in favor of nn.init.orthogonal_. nn.init.orthogonal(x1) /media/projects/sdvae/mol_vae/pytorch_eval/../mol_common/pytorch_initializer.py:38: UserWarning: nn.init.orthogonal is now deprecated in favor of nn.init.orthogonal_. nn.init.orthogonal(x2) a GRU inited a Linear inited THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524580978845/work/aten/src/THC/THCTensorRandom.cu line=25 error=2 : out of memory Traceback (most recent call last): File "valid_prior.py", line 59, in main() File "valid_prior.py", line 46, in main model = ProxyModel() File "/media/projects/sdvae/mol_vae/pytorch_eval/att_model_proxy.py", line 94, in init self.ae = self.ae.cuda() File "/home/xuzhang/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 249, in cuda return self._apply(lambda t: t.cuda(device)) File "/home/xuzhang/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 176, in _apply module._apply(fn) File "/home/xuzhang/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 176, in _apply module._apply(fn) File "/home/xuzhang/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 182, in _apply param.data = fn(param.data) File "/home/xuzhang/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 249, in return self._apply(lambda t: t.cuda(device)) RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1524580978845/work/aten/src/THC/THCTensorRandom.cu:25 /home/xuzhang/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py(249) () -> return self._apply(lambda t: t.cuda(device)) (Pdb) [8]+ Stopped ./run_valid_prior.sh
My python is 3.5, but I converted .py files from python2 to python3 using 2to3 command. pytorch is 0.4.0. cuda is 8.0. Does this higher pytorch version cause problems? Thanks ahead.
updated:
I downgraded pytorch from 0.4.0 to 0.3.1, but errors are still there.
updated: I found after I canceled the job, cuda's memories were not released, so it accumulated until out of memory. The real errors are:
xuzhang@xuzhang1:/media/projects/sdvae/mol_vae/pytorch_eval$ ./run_sample_prior.sh
save_dir for use is ../../dropbox/results/zinc
using vae
a Conv1d inited
a Conv1d inited
a Conv1d inited
a Linear inited
a Linear inited
a Linear inited
a Linear inited
a GRU inited
a Linear inited
using mol_zinc.grammar
Traceback (most recent call last):
File "sample_prior.py", line 65, in
/home/xuzhang/anaconda3/lib/python3.5/site-packages/past/builtins/noniterators.py(252)oldrange() -> return list(builtins.range(*args, **kwargs)) (Pdb) [1]+ Stopped
I think it is because of the difference between range() and xrange(), but I am not sure how to correct it.
I solved the above problem using //8 instead of /8 in file at /sdvae/mol_vae/pytorch_eval/att_model_proxy.py line 73.
Thank you very much