DeepCTR-Torch
DeepCTR-Torch copied to clipboard
Getting "RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor"
Describe the bug(问题描述)
history = model.fit(x, y, batch_size=256, epochs=20, verbose=1, validation_split=0.4, shuffle=True)
When I try model.fit for DIEN model with run_dien.py of your default example, it works when I set device to cpu but with cuda, I get this error below.
cuda ready...
0it [00:00, ?it/s]cuda:0
Train on 4 samples, validate on 0 samples, 2 steps per epoch
Traceback (most recent call last):
File "<ipython-input-1-e985ce1c0aa2>", line 69, in <module>
history = model.fit(x, y, batch_size=2, epochs=10, verbose=1, validation_split=0, shuffle=False)
File "/home/hojun/anaconda3/envs/ai/lib/python3.6/site-packages/deepctr_torch/models/basemodel.py", line 244, in fit
y_pred = model(x).squeeze()
File "/home/hojun/anaconda3/envs/ai/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/hojun/anaconda3/envs/ai/lib/python3.6/site-packages/deepctr_torch/models/dien.py", line 92, in forward
masked_interest, aux_loss = self.interest_extractor(keys_emb, keys_length, neg_keys_emb)
File "/home/hojun/anaconda3/envs/ai/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/hojun/anaconda3/envs/ai/lib/python3.6/site-packages/deepctr_torch/models/dien.py", line 221, in forward
enforce_sorted=False)
File "/home/hojun/anaconda3/envs/ai/lib/python3.6/site-packages/torch/nn/utils/rnn.py", line 244, in pack_padded_sequence
_VF._pack_padded_sequence(input, lengths, batch_first)
RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor
So I tried lengths.cpu(), lengths.to('cpu') and all of them couldnt solve the problem. Can you please provide a solution?
Operating environment(运行环境):
- python version 3.6
- torch version 1.7.1
- deepctr-torch version 0.2.7
In the new version of PyTorch, the input parameter lengths
of torch.nn.utils.rnn.pack_padded_sequence
has been changed: (Details can be found in https://github.com/pytorch/pytorch/issues/43227)
Obviously I tried. But as I said, none of them worked. But I had to get way down to torch 1.4.0 to get it done.
Obviously I tried. But as I said, none of them worked. But I had to get way down to torch 1.4.0 to get it done.
Where did you use .cpu()
? Did the device of the tensor change after you use .cpu()
?
Yes. I did. as I mentioned below.
So I tried lengths.cpu(), lengths.to('cpu') and all of them couldnt solve the problem
The length part is the one I tried to put into cpu as the exact same persons mruberry and ngimel suggested. That was the first web page I found as well when I was trying to fix the problem.
- Where did you use
.cpu()
?
Could you tell me the corresponding line number in the code? for example:
https://github.com/shenweichen/DeepCTR-Torch/blob/b4d8181e86c2165722fa9331bc16185832596232/deepctr_torch/models/dien.py#L220-L221
Did you set masked_keys_length.cpu()
here?
or other places like https://github.com/shenweichen/DeepCTR-Torch/blob/b4d8181e86c2165722fa9331bc16185832596232/deepctr_torch/models/dien.py#L356
or
https://github.com/shenweichen/DeepCTR-Torch/blob/b4d8181e86c2165722fa9331bc16185832596232/deepctr_torch/models/dien.py#L365
- Did the device of the tensor change after you use
.cpu()
?
Could you print the device of the tensor before and after your .cpu()
? To figure out whether it works. If it works, there should not be the error "RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor"
Hello. I have done for all the pack_padded_sequences for example, masked_keys_length.cpu(). When I did this, it was converted to cpu one. But the error was still there. For me, only downgrading torch version worked. It is strange tho. That was the whole point of the question. It became CPU tensor, but it didnt work. Is it working on your side?
@Jeriousman I add .cpu()
in all the pack_padded_sequence(...)
in dien.py, then it works. Maybe you missed something. Could you paste the traceback info and your dien.py
file?
hi any one tell me same error on torch==1.8.0 , how to handle this