AS-GCN icon indicating copy to clipboard operation
AS-GCN copied to clipboard

RuntimeError: shape '[-1, 3, 1, 25]' is invalid for input of size 3456

Open petteriTeikari opened this issue 6 years ago • 3 comments

I did the warmup pretraining for 9 epochs (instead of your 10 epochs) and that worked okay, and wanted to continue with the train part but hit that error then on the 11th epoch (thus can train for the 10th epoch, but a magical error then happens on the 11th?, and can't permute the x_last)

[08.13.19|01:43:40] Training epoch: 9
AS-GCN/net/utils/adj_learn.py:11: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  soft_max_1d = F.softmax(trans_input)
[08.13.19|01:43:41] 	Iter 0 Done. | loss2: 1961.2838 | loss_nll: 1938.3290 | loss_kl: 22.9549 | lr: 0.000500
[08.13.19|01:44:21] 	Iter 100 Done. | loss2: 1593.5586 | loss_nll: 1572.3160 | loss_kl: 21.2426 | lr: 0.000500
.....
[08.13.19|02:33:32] 	Iter 7400 Done. | loss2: 1933.1755 | loss_nll: 1911.2896 | loss_kl: 21.8860 | lr: 0.000500
[08.13.19|02:34:12] 	Iter 7500 Done. | loss2: 1391.6711 | loss_nll: 1369.8772 | loss_kl: 21.7940 | lr: 0.000500
[08.13.19|02:34:17] 	mean_loss2: 1974.6090346069418
[08.13.19|02:34:17] 	mean_loss_nll: 1951.943839369832
[08.13.19|02:34:17] 	mean_loss_kl: 22.665194850461106
[08.13.19|02:34:17] Time consumption:
[08.13.19|02:34:17] Done.
[08.13.19|02:34:17] The model has been saved as ./work_dir/recognition/kinetics/AS_GCN/max_hop_4/lamda_05/epoch9_model1.pt.
[08.13.19|02:34:17] The model has been saved as ./work_dir/recognition/kinetics/AS_GCN/max_hop_4/lamda_05/epoch9_model2.pt.
[08.13.19|02:34:17] Eval epoch: 9
[08.13.19|02:36:22] 	mean_loss2: 2030.5040628313056
[08.13.19|02:36:22] 	mean_loss_nll: 2008.193604698859
[08.13.19|02:36:22] 	mean_loss_kl: 22.310456798226845
[08.13.19|02:36:22] Done.
[08.13.19|02:36:22] Training epoch: 10
Traceback (most recent call last):
  File "main.py", line 30, in <module>
    p.start()
  File "AS-GCN/processor/processor.py", line 111, in start
    self.train(training_A=False)
  File "AS-GCN/processor/recognition.py", line 161, in train
    x_class, pred, target = self.model1(data, target_data, data_last, A_batch, self.arg.lamda_act)
  File "/home/petteri/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "AS-GCN/net/as_gcn.py", line 59, in forward
    x_last = x_last.permute(0,4,1,2,3).contiguous().view(-1,3,1,25)
RuntimeError: shape '[-1, 3, 1, 25]' is invalid for input of size 3456

And I assume that the error is being propagated from the data generation script for which I used the npy generation code from 2s-AGCN implementation: https://github.com/lshiwjx/2s-AGCN/blob/master/data_gen/kinetics_gendata.py

As when looking at the size, is it so that the line is defined statically and the last 25 refers to the number of joints in the dataset, and when I use the kinetics instead of the NTU-RGBD this should be conditionally set as you do here for the openpose?

x_last = x_last.permute(0,4,1,2,3).contiguous().view(-1,3,1,25)
32 3 290 18 2 # (N, C, T, V, M)
x_last:  torch.Size([32, 3, 1, 18, 2])
x_recon:  torch.Size([32, 3, 290, 18])
x1:  torch.Size([32, 3, 290, 18, 2])
x2:  torch.Size([32, 2, 18, 3, 290])
x3:  torch.Size([64, 54, 290])

printed like this

def forward(self, x, x_target, x_last, A_act, lamda_act):
     N, C, T, V, M = x.size()
     print(N, C, T, V, M)
     print('x_last: ', x_last.shape)
     x_recon = x[:,:,:,:,0]                                  # [2N, 3, 300, 25]
     print('x_recon: ', x_recon.shape)
     print('x1: ', x.shape)
     x = x.permute(0, 4, 3, 1, 2).contiguous()               # [N, 2, 25, 3, 300]
     print('x2: ', x.shape)
     x = x.view(N * M, V * C, T)                             # [2N, 75, 300]
     print('x3: ', x.shape)
     x_last = x_last.permute(0,4,1,2,3).contiguous().view(-1,3,1,25)
     print('x_last: ', x_last.shape)

petteriTeikari avatar Aug 13 '19 13:08 petteriTeikari

And I found couple of other hard-coded 25s from your code, and after changing them to a variable containing 18 fixed my training problem

petteriTeikari avatar Aug 13 '19 19:08 petteriTeikari

I changed all the key nodes in the network structure to 18, but there will still be the error of "different from the previous size". I would like to ask which files will be changed.

ljm150 avatar Oct 16 '19 02:10 ljm150

And I found couple of other hard-coded 25s from your code, and after changing them to a variable containing 18 fixed my training problem I changed all the key nodes in the network structure to 18, but there will still be the error of "different from the previous size". I would like to ask which files will be changed.

ljm150 avatar Oct 16 '19 13:10 ljm150