sgan icon indicating copy to clipboard operation
sgan copied to clipboard

invalid argument 2: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces).

Open zeyayin opened this issue 6 years ago • 7 comments

Hello I meet an issue when I try to train new model on wins with python 0.4.1 .

with command !python scripts/train.py --noise_dim=0

I remove all the cuda() in order to run on cpu but I meet this issue:

Traceback (most recent call last): File "scripts/train.py", line 580, in main(args) File "scripts/train.py", line 245, in main optimizer_d) File "scripts/train.py", line 371, in discriminator_step generator_out = generator(obs_traj, obs_traj_rel, seq_start_end) File "C:\Users\zeya\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 477, in call result = self.forward(*input, **kwargs) File "C:\Users\zeya\sgan-master\scripts\sgan\models.py", line 508, in forward final_encoder_h = self.encoder(obs_traj_rel) File "C:\Users\zeya\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 477, in call result = self.forward(*input, **kwargs) File "C:\Users\zeya\sgan-master\scripts\sgan\models.py", line 63, in forward obs_traj_embedding = self.spatial_embedding(obs_traj.view(-1, 2)) RuntimeError: invalid argument 2: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Call .contiguous() before .view(). at c:\new-builder_3\win-wheel\pytorch\aten\src\th\generic/THTensor.cpp:237

zeyayin avatar Dec 02 '18 22:12 zeyayin

Hi! Could you provide the script you have run, so that the issue can be recreated?

angeliand avatar Dec 02 '18 22:12 angeliand

Hi! Could you provide the script you have run, so that the issue can be recreated?

Hi, I change my issue but not quite sure about your meaning of providing the script, Is the form of issue as what you expected now?

zeyayin avatar Dec 02 '18 22:12 zeyayin

This was exactly what I wanted, thank you! This is weird though, as it runs just fine for me. I would try different things if I were you:

  • you could try editing the code in sgan\models.py line 63. to obs_traj_embedding = self.spatial_embedding(obs_traj.contiguous().view(-1, 2)) or
  • you could install the exact requirements , as this can cause problems too.

angeliand avatar Dec 02 '18 22:12 angeliand

After I change the model.py line 63 and run this command !python scripts/train.py --noise_dim= It just keep running or even got stuck without giving any response. I will try it on desktop PC later.

zeyayin avatar Dec 02 '18 23:12 zeyayin

This was exactly what I wanted, thank you! This is weird though, as it runs just fine for me. I would try different things if I were you:

* you could try editing the code in [sgan\models.py line 63](https://github.com/agrimgupta92/sgan/blob/master/sgan/models.py#L63). to
  `obs_traj_embedding = self.spatial_embedding(obs_traj.contiguous().view(-1, 2))`
  or

* you could install the exact [requirements ](https://github.com/agrimgupta92/sgan/blob/master/requirements.txt), as this can cause problems too.

Hello after I add the contiguous() and run on destop, it begins to ask for updating of cuda driver version even I have delete all cuda(). Is there anyway I can run on CPU instead of gpu

zeyayin avatar Dec 05 '18 00:12 zeyayin

The issue arises from permuting mini-batch in seq_collate(data) of the data loader without calling contiguous() afterwards; tensor.permute breaks contiguity of a tensor and calling view on it raises an error. Training and testing on GPU are not subjected to this issue because tensor.cuda() automatically makes the tensor contiguous. Using reshape() instead of view() may fix this problem as well. These can be seen from below:

obs_traj.view(-1, 2).shape = {RuntimeError}invalid argument 2: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Call .contiguous() before .view(). at /pytorch/aten/src/TH/generic/THTensor.cpp:213
obs_traj.numpy().strides = {tuple} <class 'tuple'>: (4, 64, 32)
obs_traj.contiguous().view(-1, 2).shape = {Size} torch.Size([1304, 2])
obs_traj.contiguous().numpy().strides = {tuple} <class 'tuple'>: (1304, 8, 4)
obs_traj.cuda().cpu().numpy().shape = {tuple} <class 'tuple'>: (8, 163, 2)
obs_traj.cuda().cpu().numpy().strides = {tuple} <class 'tuple'>: (1304, 8, 4)
obs_traj.reshape(-1, 2).numpy().shape = {tuple} <class 'tuple'>: (1304, 2)
obs_traj.reshape(-1, 2).numpy().strides = {tuple} <class 'tuple'>: (8, 4)

xieshuaix avatar Jan 10 '19 06:01 xieshuaix

In models.py:

  • change line ~67 to: obs_traj_embedding = self.spatial_embedding(obs_traj.contiguous().view(-1, 2))
  • change line ~221 to: curr_hidden = h_states.contiguous().view(-1, self.h_dim)[start:end]

This resolved the issue for me.

davidglavas avatar Apr 29 '19 11:04 davidglavas