DeepCTR-Torch icon indicating copy to clipboard operation
DeepCTR-Torch copied to clipboard

Getting "RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor"

Open Jeriousman opened this issue 2 years ago • 7 comments

Describe the bug(问题描述) history = model.fit(x, y, batch_size=256, epochs=20, verbose=1, validation_split=0.4, shuffle=True) When I try model.fit for DIEN model with run_dien.py of your default example, it works when I set device to cpu but with cuda, I get this error below.

cuda ready...
0it [00:00, ?it/s]cuda:0
Train on 4 samples, validate on 0 samples, 2 steps per epoch

Traceback (most recent call last):

  File "<ipython-input-1-e985ce1c0aa2>", line 69, in <module>
    history = model.fit(x, y, batch_size=2, epochs=10, verbose=1, validation_split=0, shuffle=False)

  File "/home/hojun/anaconda3/envs/ai/lib/python3.6/site-packages/deepctr_torch/models/basemodel.py", line 244, in fit
    y_pred = model(x).squeeze()

  File "/home/hojun/anaconda3/envs/ai/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)

  File "/home/hojun/anaconda3/envs/ai/lib/python3.6/site-packages/deepctr_torch/models/dien.py", line 92, in forward
    masked_interest, aux_loss = self.interest_extractor(keys_emb, keys_length, neg_keys_emb)

  File "/home/hojun/anaconda3/envs/ai/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)

  File "/home/hojun/anaconda3/envs/ai/lib/python3.6/site-packages/deepctr_torch/models/dien.py", line 221, in forward
    enforce_sorted=False)

  File "/home/hojun/anaconda3/envs/ai/lib/python3.6/site-packages/torch/nn/utils/rnn.py", line 244, in pack_padded_sequence
    _VF._pack_padded_sequence(input, lengths, batch_first)

RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor

So I tried lengths.cpu(), lengths.to('cpu') and all of them couldnt solve the problem. Can you please provide a solution?

Operating environment(运行环境):

  • python version 3.6
  • torch version 1.7.1
  • deepctr-torch version 0.2.7

Jeriousman avatar Mar 17 '22 01:03 Jeriousman

In the new version of PyTorch, the input parameter lengths of torch.nn.utils.rnn.pack_padded_sequence has been changed: (Details can be found in https://github.com/pytorch/pytorch/issues/43227)

image

image

zanshuxun avatar Apr 04 '22 12:04 zanshuxun

Obviously I tried. But as I said, none of them worked. But I had to get way down to torch 1.4.0 to get it done.

Jeriousman avatar Apr 04 '22 23:04 Jeriousman

Obviously I tried. But as I said, none of them worked. But I had to get way down to torch 1.4.0 to get it done.

Where did you use .cpu()? Did the device of the tensor change after you use .cpu()?

zanshuxun avatar Apr 05 '22 09:04 zanshuxun

Yes. I did. as I mentioned below.

So I tried lengths.cpu(), lengths.to('cpu') and all of them couldnt solve the problem

The length part is the one I tried to put into cpu as the exact same persons mruberry and ngimel suggested. That was the first web page I found as well when I was trying to fix the problem.

Jeriousman avatar Apr 06 '22 01:04 Jeriousman

  1. Where did you use .cpu()?

Could you tell me the corresponding line number in the code? for example:

https://github.com/shenweichen/DeepCTR-Torch/blob/b4d8181e86c2165722fa9331bc16185832596232/deepctr_torch/models/dien.py#L220-L221

Did you set masked_keys_length.cpu() here?

or other places like https://github.com/shenweichen/DeepCTR-Torch/blob/b4d8181e86c2165722fa9331bc16185832596232/deepctr_torch/models/dien.py#L356

or

https://github.com/shenweichen/DeepCTR-Torch/blob/b4d8181e86c2165722fa9331bc16185832596232/deepctr_torch/models/dien.py#L365

  1. Did the device of the tensor change after you use .cpu()?

Could you print the device of the tensor before and after your .cpu()? To figure out whether it works. If it works, there should not be the error "RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor"

zanshuxun avatar Apr 06 '22 06:04 zanshuxun

Hello. I have done for all the pack_padded_sequences for example, masked_keys_length.cpu(). When I did this, it was converted to cpu one. But the error was still there. For me, only downgrading torch version worked. It is strange tho. That was the whole point of the question. It became CPU tensor, but it didnt work. Is it working on your side?

Jeriousman avatar Apr 07 '22 07:04 Jeriousman

@Jeriousman I add .cpu() in all the pack_padded_sequence(...) in dien.py, then it works. Maybe you missed something. Could you paste the traceback info and your dien.py file?

zanshuxun avatar Jun 27 '22 03:06 zanshuxun

hi any one tell me same error on torch==1.8.0 , how to handle this

umanniyaz avatar Mar 15 '23 20:03 umanniyaz