nmp_qc
nmp_qc copied to clipboard
RuntimeError during default execution
Hello, thank you for your implemenation!
I've just tried to run default experiment with
python main.py --no-cuda --epochs 1
and run into the following problem
/opt/conda/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6 return f(*args, **kwds)
Prepare files
Define model
Statistics
Create model
Optimizer
Logger
=> no best model found at './checkpoint/qm9/mpnn/model_best.pth'
Check cuda
Traceback (most recent call last):
File "main.py", line 321, in <module>
main()
File "main.py", line 182, in main
train(train_loader, model, criterion, optimizer, epoch, evaluation, logger)
File "main.py", line 242, in train
output = model(g, h, e)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 319, in __call__
result = self.forward(*input, **kwargs)
File "/data/grishin/nmp_qc/models/MPNN.py", line 78, in forward
m = self.m[0].forward(h[t], h_aux, e_aux)
File "/data/grishin/nmp_qc/MessageFunction.py", line 43, in forward
return self.m_function(h_v, h_w, e_vw, args)
File "/data/grishin/nmp_qc/MessageFunction.py", line 175, in m_mpnn
h_w_rows = h_w[..., None].expand(h_w.size(0), h_v.size(1), h_w.size(1)).contiguous()
RuntimeError: The expanded size of the tensor (25) must match the existing size (73) at non-singleton dimension 1
Am i doing something wrong? Thank you in advance.
Hi,
Sorry for the big delay on the answer, in my opinion the errors you reported come from the Pytorch version. I've got similar errors changing the pytorch release due to changes on the "sum" behaviour. It was in another code I am working on.
https://github.com/pytorch/pytorch/releases "All reduce functions such as sum and mean now default to squeezing the reduced dimension."
I suggest to add keepdim=False in sum operations for fast and easy solve of this problem.
After a few weeks, I will try to fix the code to new pytorch versions.
I tried to fix the problem and made some improvements, but not confident with the correctness, someone may verify it.
Hello, @priba
To make things easier, which version of pytorch are we supposed to be running?
Hello, @priba
Thanks for the implementation! I encountered the same issue here. I experimented with pytorch versions 0.2.0, 0.3.0 and 1.0.0, and I've also added keepdim=False to all sum operations in datasets.utils.py and models.MPNN.py, but none of them worked.
(rdkit) Adams-MacBook-Pro-4:mpnn iron4dam$ python main.py --no-cuda
Prepare files
Define model
Statistics
Create model
Optimizer
Logger
=> no best model found at './checkpoint/qm9/mpnn/model_best.pth'
Check cuda
Traceback (most recent call last):
File "main.py", line 320, in <module>
main()
File "main.py", line 182, in main
train(train_loader, model, criterion, optimizer, epoch, evaluation, logger)
File "main.py", line 241, in train
output = model(g, h, e)
File "/Users/iron4dam/anaconda3/envs/rdkit/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
result = self.forward(*input, **kwargs)
File "/Users/iron4dam/Google_Drive/Part_C/Dissertation/dissertation_code/mpnn/models/MPNN.py", line 78, in forward
m = self.m[0].forward(h[t], h_aux, e_aux)
File "/Users/iron4dam/Google_Drive/Part_C/Dissertation/dissertation_code/mpnn/MessageFunction.py", line 43, in forward
return self.m_function(h_v, h_w, e_vw, args)
File "/Users/iron4dam/Google_Drive/Part_C/Dissertation/dissertation_code/mpnn/MessageFunction.py", line 174, in m_mpnn
h_w_rows = h_w[..., None].expand(h_w.size(0), h_v.size(1), h_w.size(1)).contiguous()
RuntimeError: The expanded size of the tensor (25) must match the existing size (73) at non-singleton dimension 1. at /Users/soumith/minicondabuild3/conda-bld/pytorch_1512381214802/work/torch/lib/TH/generic/THTensor.c:309
@ay27 I've applied your patch and have another problem:
(nmpqc) rmrmg@kolos:/chematica/pka/nmpqc/nmp_qc$ LD_PRELOAD=$CONDA_PREFIX/lib/libstdc++.so python ./main.py --no-cuda
loaeed Prepare files Define model Statistics Create model Optimizer Logger => no best model found at './checkpoint/qm9/mpnn/model_best.pth' Check cuda Traceback (most recent call last): File "./main.py", line 330, inmain() File "./main.py", line 191, in main train(train_loader, model, criterion, optimizer, epoch, evaluation, logger) File "./main.py", line 254, in train losses.update(train_loss.data[0], g.size(0)) IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number
I run into the same error: h_w_rows = h_w[..., None].expand(h_w.size(0), h_v.size(1), h_w.size(1)).contiguous() RuntimeError: The expanded size of the tensor (24) must match the existing size (73) at non-singleton dimension 1
So if it is due to version update, could I know what version you are using? (I am using pytorch 0.4.1)
At that time I was using Pytorch 0.3.0
I made a small change like this: h_w_rows = h_w[..., None].expand(h_w.size(0), h_w.size(1),h_v.size(1), ).contiguous() It seems to work. But i am not sure about the results.
I made a small change like this: h_w_rows = h_w[..., None].expand(h_w.size(0), h_w.size(1),h_v.size(1), ).contiguous() It seems to work. But i am not sure about the results.
@njwm I did the same in order to get past that error and it worked (even though another similar error came regarding a .sum operation). But can you please verify if it affected the results?
Perhaps it is better like this: h_w_rows = h_w[:, None,:].expand(h_w.size(0), h_v.size(1), h_w.size(1)).contiguous()
I made a small change like this: h_w_rows = h_w[..., None].expand(h_w.size(0), h_w.size(1),h_v.size(1), ).contiguous() It seems to work. But i am not sure about the results.
@njwm I did the same in order to get past that error and it worked (even though another similar error came regarding a .sum operation). But can you please verify if it affected the results?
I don't think it makes sense,it just gets past that error.