vits icon indicating copy to clipboard operation
vits copied to clipboard

Training on Tesla K80

Open StuteePatil opened this issue 4 years ago • 3 comments

Hi, Using Tesla K80 to train the model is giving the following error. Does the model require specific GPU architecture for training?

File "train.py", line 290, in main() File "train.py", line 50, in main mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,)) File "/anaconda/envs/vits/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/anaconda/envs/vits/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes while not context.join(): File "/anaconda/envs/vits/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join raise Exception(msg) Exception:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "/anaconda/envs/vits/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap fn(i, *args) File "/media/hdd1tb/tts-VITS/vits-main/train.py", line 117, in run train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval]) File "/media/hdd1tb/tts-VITS/vits-main/train.py", line 162, in train_and_evaluate hps.data.mel_fmax File "/media/hdd1tb/tts-VITS/vits-main/mel_processing.py", line 105, in mel_spectrogram_torch center=center, pad_mode='reflect', normalized=False, onesided=True) File "/anaconda/envs/vits/lib/python3.7/site-packages/torch/functional.py", line 465, in stft return _VF.stft(input, n_fft, hop_length, win_length, window, normalized, onesided) RuntimeError: cuFFT doesn't support signals of half type with compute capability less than SM_53, but the device containing input half tensor only has SM_37

StuteePatil avatar Jul 28 '21 10:07 StuteePatil

Did you try "fp16_run": false ?

nikich340 avatar Nov 23 '21 03:11 nikich340

You cannot train vits on a K80. K80's are a very weak GPU. You need at least a Tesla P100 or a T4 in order to avoid errors when training. A valid explanation for this is because K80's don't have enough memory for training.

skilomlg avatar Feb 15 '22 00:02 skilomlg

You cannot train vits on a K80. K80's are a very weak GPU. You need at least a Tesla P100 or a T4 in order to avoid errors when training. A valid explanation for this is because K80's don't have enough memory for training.

That's not true. You can train on any GPU which supports cuda, but have to set fitting batch size. It reduce resulting quality, true, but it doesn't mean "you cannot train".

nikich340 avatar Mar 04 '22 05:03 nikich340