nmp_qc icon indicating copy to clipboard operation
nmp_qc copied to clipboard

RuntimeError during default execution

Open AlexanderGri opened this issue 7 years ago • 11 comments

Hello, thank you for your implemenation!

I've just tried to run default experiment with

python main.py --no-cuda --epochs 1

and run into the following problem

/opt/conda/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6 return f(*args, **kwds)
Prepare files
Define model
        Statistics
        Create model
Optimizer
Logger
=> no best model found at './checkpoint/qm9/mpnn/model_best.pth'
Check cuda
Traceback (most recent call last):
  File "main.py", line 321, in <module>
    main()
  File "main.py", line 182, in main
    train(train_loader, model, criterion, optimizer, epoch, evaluation, logger)
  File "main.py", line 242, in train
    output = model(g, h, e)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 319, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/grishin/nmp_qc/models/MPNN.py", line 78, in forward
    m = self.m[0].forward(h[t], h_aux, e_aux)
  File "/data/grishin/nmp_qc/MessageFunction.py", line 43, in forward
    return self.m_function(h_v, h_w, e_vw, args)
  File "/data/grishin/nmp_qc/MessageFunction.py", line 175, in m_mpnn
    h_w_rows = h_w[..., None].expand(h_w.size(0), h_v.size(1), h_w.size(1)).contiguous()
RuntimeError: The expanded size of the tensor (25) must match the existing size (73) at non-singleton dimension 1

Am i doing something wrong? Thank you in advance.

AlexanderGri avatar Dec 16 '17 14:12 AlexanderGri

Hi,

Sorry for the big delay on the answer, in my opinion the errors you reported come from the Pytorch version. I've got similar errors changing the pytorch release due to changes on the "sum" behaviour. It was in another code I am working on.

https://github.com/pytorch/pytorch/releases "All reduce functions such as sum and mean now default to squeezing the reduced dimension."

I suggest to add keepdim=False in sum operations for fast and easy solve of this problem.

After a few weeks, I will try to fix the code to new pytorch versions.

priba avatar Feb 01 '18 09:02 priba

I tried to fix the problem and made some improvements, but not confident with the correctness, someone may verify it.

ay27 avatar Feb 01 '18 14:02 ay27

Hello, @priba

To make things easier, which version of pytorch are we supposed to be running?

josejimenezluna avatar Jun 25 '18 11:06 josejimenezluna

Hello, @priba

Thanks for the implementation! I encountered the same issue here. I experimented with pytorch versions 0.2.0, 0.3.0 and 1.0.0, and I've also added keepdim=False to all sum operations in datasets.utils.py and models.MPNN.py, but none of them worked.

(rdkit) Adams-MacBook-Pro-4:mpnn iron4dam$ python main.py --no-cuda
Prepare files
Define model
	Statistics
	Create model
Optimizer
Logger
=> no best model found at './checkpoint/qm9/mpnn/model_best.pth'
Check cuda
Traceback (most recent call last):
  File "main.py", line 320, in <module>
    main()
  File "main.py", line 182, in main
    train(train_loader, model, criterion, optimizer, epoch, evaluation, logger)
  File "main.py", line 241, in train
    output = model(g, h, e)
  File "/Users/iron4dam/anaconda3/envs/rdkit/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/iron4dam/Google_Drive/Part_C/Dissertation/dissertation_code/mpnn/models/MPNN.py", line 78, in forward
    m = self.m[0].forward(h[t], h_aux, e_aux)
  File "/Users/iron4dam/Google_Drive/Part_C/Dissertation/dissertation_code/mpnn/MessageFunction.py", line 43, in forward
    return self.m_function(h_v, h_w, e_vw, args)
  File "/Users/iron4dam/Google_Drive/Part_C/Dissertation/dissertation_code/mpnn/MessageFunction.py", line 174, in m_mpnn
    h_w_rows = h_w[..., None].expand(h_w.size(0), h_v.size(1), h_w.size(1)).contiguous()
RuntimeError: The expanded size of the tensor (25) must match the existing size (73) at non-singleton dimension 1. at /Users/soumith/minicondabuild3/conda-bld/pytorch_1512381214802/work/torch/lib/TH/generic/THTensor.c:309

adamxyang avatar Jan 19 '19 01:01 adamxyang

@ay27 I've applied your patch and have another problem:

(nmpqc) rmrmg@kolos:/chematica/pka/nmpqc/nmp_qc$ LD_PRELOAD=$CONDA_PREFIX/lib/libstdc++.so python ./main.py --no-cuda
loaeed Prepare files Define model Statistics Create model Optimizer Logger => no best model found at './checkpoint/qm9/mpnn/model_best.pth' Check cuda Traceback (most recent call last): File "./main.py", line 330, in main() File "./main.py", line 191, in main train(train_loader, model, criterion, optimizer, epoch, evaluation, logger) File "./main.py", line 254, in train losses.update(train_loss.data[0], g.size(0)) IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

rmrmg avatar Mar 03 '19 08:03 rmrmg

I run into the same error: h_w_rows = h_w[..., None].expand(h_w.size(0), h_v.size(1), h_w.size(1)).contiguous() RuntimeError: The expanded size of the tensor (24) must match the existing size (73) at non-singleton dimension 1

So if it is due to version update, could I know what version you are using? (I am using pytorch 0.4.1)

wmmxk avatar Jun 09 '19 18:06 wmmxk

At that time I was using Pytorch 0.3.0

priba avatar Jun 10 '19 15:06 priba

I made a small change like this: h_w_rows = h_w[..., None].expand(h_w.size(0), h_w.size(1),h_v.size(1), ).contiguous() It seems to work. But i am not sure about the results.

njwm avatar Aug 08 '21 13:08 njwm

I made a small change like this: h_w_rows = h_w[..., None].expand(h_w.size(0), h_w.size(1),h_v.size(1), ).contiguous() It seems to work. But i am not sure about the results.

@njwm I did the same in order to get past that error and it worked (even though another similar error came regarding a .sum operation). But can you please verify if it affected the results?

sthakurr avatar Mar 24 '22 06:03 sthakurr

Perhaps it is better like this: h_w_rows = h_w[:, None,:].expand(h_w.size(0), h_v.size(1), h_w.size(1)).contiguous()

njwm avatar Apr 22 '22 03:04 njwm

I made a small change like this: h_w_rows = h_w[..., None].expand(h_w.size(0), h_w.size(1),h_v.size(1), ).contiguous() It seems to work. But i am not sure about the results.

@njwm I did the same in order to get past that error and it worked (even though another similar error came regarding a .sum operation). But can you please verify if it affected the results?

I don't think it makes sense,it just gets past that error.

njwm avatar Apr 22 '22 03:04 njwm