TTS
TTS copied to clipboard
[Bug] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
Describe the bug
I'm trying to run Tacotron2 training, but receives RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
To Reproduce
CUDA_VISIBLE_DEVICES="0" python3 train_tacotron_ddc.py
Expected behavior
No response
Logs
admin@8f7837b57ed6:~/TTS$ CUDA_VISIBLE_DEVICES="0" python3 train_tacotron_ddc.py
> Setting up Audio Processor...
| > sample_rate:22050
| > resample:False
| > num_mels:80
| > log_func:np.log
| > min_level_db:-100
| > frame_shift_ms:None
| > frame_length_ms:None
| > ref_level_db:20
| > fft_size:1024
| > power:1.5
| > preemphasis:0.0
| > griffin_lim_iters:60
| > signal_norm:False
| > symmetric_norm:True
| > mel_fmin:0
| > mel_fmax:8000
| > pitch_fmin:0.0
| > pitch_fmax:640.0
| > spec_gain:1.0
| > stft_pad_mode:reflect
| > max_norm:4.0
| > clip_norm:True
| > do_trim_silence:True
| > trim_db:60.0
| > do_sound_norm:False
| > do_amp_to_db_linear:True
| > do_amp_to_db_mel:True
| > do_rms_norm:False
| > db_level:None
| > stats_path:None
| > base:2.718281828459045
| > hop_length:256
| > win_length:1024
> Setting up Audio Processor...
| > sample_rate:22050
| > resample:False
| > num_mels:80
| > log_func:np.log
| > min_level_db:-100
| > frame_shift_ms:None
| > frame_length_ms:None
| > ref_level_db:20
| > fft_size:1024
| > power:1.5
| > preemphasis:0.0
| > griffin_lim_iters:60
| > signal_norm:False
| > symmetric_norm:True
| > mel_fmin:0
| > mel_fmax:8000
| > pitch_fmin:0.0
| > pitch_fmax:640.0
| > spec_gain:1.0
| > stft_pad_mode:reflect
| > max_norm:4.0
| > clip_norm:True
| > do_trim_silence:True
| > trim_db:60.0
| > do_sound_norm:False
| > do_amp_to_db_linear:True
| > do_amp_to_db_mel:True
| > do_rms_norm:False
| > db_level:None
| > stats_path:None
| > base:2.718281828459045
| > hop_length:256
| > win_length:1024
| > Found 9039 files in /home/admin/M-AI-Labs/resampled_to_22050/by_book/male/minaev/oblomov
> Using CUDA: True
> Number of GPUs: 1
> Model has 47669492 parameters
> Number of output frames: 6
> EPOCH: 0/1000
--> /home/admin/TTS/run-August-02-2022_11+05AM-903a77c1
> DataLoader initialization
| > Tokenizer:
| > add_blank: False
| > use_eos_bos: False
| > use_phonemes: True
| > phonemizer:
| > phoneme language: ru-ru
| > phoneme backend: gruut
| > Number of instances : 8949
| > Preprocessing samples
| > Max text length: 216
| > Min text length: 3
| > Avg text length: 99.18292546653258
|
| > Max audio length: 583682.0
| > Min audio length: 26014.0
| > Avg audio length: 182216.04805006145
| > Num. instances discarded samples: 0
| > Batch group size: 0.
> TRAINING (2022-08-02 11:05:38)
/home/admin/TTS/TTS/tts/models/tacotron2.py:331: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
alignment_lengths = (
! Run is removed from /home/admin/TTS/run-August-02-2022_11+05AM-903a77c1
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 1534, in fit
self._fit()
File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 1518, in _fit
self.train_epoch()
File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 1283, in train_epoch
_, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 1115, in train_step
outputs, loss_dict_new, step_time = self._optimize(
File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 999, in _optimize
outputs, loss_dict = self._model_train_step(batch, model, criterion)
File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 955, in _model_train_step
return model.train_step(*input_args)
File "/home/admin/TTS/TTS/tts/models/tacotron2.py", line 339, in train_step
loss_dict = criterion(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/admin/TTS/TTS/tts/layers/losses.py", line 440, in forward
self.criterion_st(stopnet_output, stopnet_target, stop_target_length)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/admin/TTS/TTS/tts/layers/losses.py", line 193, in forward
loss = functional.binary_cross_entropy_with_logits(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py", line 3150, in binary_cross_entropy_with_logits
return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
Environment
{
"CUDA": {
"GPU": [
"NVIDIA GeForce RTX 2080 Ti",
"NVIDIA GeForce RTX 2080 Ti"
],
"available": true,
"version": "10.2"
},
"Packages": {
"PyTorch_debug": false,
"PyTorch_version": "1.12.0+cu102",
"TTS": "0.7.1",
"numpy": "1.21.6"
},
"System": {
"OS": "Linux",
"architecture": [
"64bit",
"ELF"
],
"processor": "x86_64",
"python": "3.8.10",
"version": "#36~20.04.1-Ubuntu SMP Fri Aug 27 08:06:32 UTC 2021"
}
}
Additional context
No response
Check the device type of the tensors at the error line. Move them all to either GPU or cpu and run.
Check the device type of the tensors at the error line. Move them all to either GPU or cpu and run.
@p0p4k Can you elaborate a bit, please? I'm having the same problem.
@BrukArkady The answer is given by @p0p4k. To be precise, you should change line #3150
of functional.py
from:
return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)
to:
return torch.binary_cross_entropy_with_logits(input.cuda(), target.cuda(), weight, pos_weight.cuda(), reduction_enum)
For precise analysis of the error, append the following code, you can see the tensors' respective devices. In the TTS/TTS/tts/layers/losses.py
file,
#add immediately above L193
tensors_to_check = [x.masked_select(mask), target.masked_select(mask), self.pos_weight]
for t in tensors_to_check:
try:
print(f'tensor {t} is on GPU device - {t.get_device()}
except:
print(f'tensor {t} is on cpu'}
#add immediately above L197
tensors_to_check = [x, target, pos_weight=self.pos_weight]
for t in tensors_to_check:
try:
print(f'tensor {t} is on GPU device - {t.get_device()}
except:
print(f'tensor {t} is on cpu'}
Then you can just add tensor.cuda()
to change a tensor's device to GPU. You can do this directly without doing the above step as well.
@BrukArkady The answer is given by @p0p4k. To be precise, you should change
line #3150
offunctional.py
from:return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)
to:
return torch.binary_cross_entropy_with_logits(input.cuda(), target.cuda(), weight, pos_weight.cuda(), reduction_enum)
Would prefer doing the changes in TTS file, rather than Pytorch library.
@BrukArkady The answer is given by @p0p4k. To be precise, you should change
line #3150
offunctional.py
from:return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)
to:
return torch.binary_cross_entropy_with_logits(input.cuda(), target.cuda(), weight, pos_weight.cuda(), reduction_enum)
Would prefer doing the changes in TTS file, rather than Pytorch library.
Got it. Thanks.
@BrukArkady The answer is given by @p0p4k. To be precise, you should change
line #3150
offunctional.py
from:return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)
to:
return torch.binary_cross_entropy_with_logits(input.cuda(), target.cuda(), weight, pos_weight.cuda(), reduction_enum)
Would prefer doing the changes in TTS file, rather than Pytorch library.
Got it. Thanks.
Then again one issue is if you are using a machine with no GPU, we have to write if-else for that. Can you make a PR?
Then again one issue is if you are using a machine with no GPU, we have to write if-else for that. Can you make a PR?
I'm afraid I can't right now. I'll look into it once I get some spare time.
same exact issue here... can someone comment on this issue when it is fixed, so i can try again... Thanks
Fixed by #1872