DeepConvolutionalTTS-pytorch
DeepConvolutionalTTS-pytorch copied to clipboard
OOM error in SSRN
I'm reproduce the result.
after I finish Text2Mel training,
I try to train SSRN. however, hit Out of Memory.
https://github.com/Yangyangii/DeepConvolutionalTTS-pytorch/blob/master/train.py#L195 https://github.com/Yangyangii/DeepConvolutionalTTS-pytorch/blob/master/model.py#L75
Do you have any idea?
Traceback (most recent call last):
File "train.py", line 223, in <module>
main(network=network)
File "train.py", line 206, in main
batch_size=args.batch_size, ckpt_dir=ckpt_dir, writer=writer)
File "train.py", line 53, in train
mags_hat = model(mels) # mags_hat: (N, Ty, n_mags)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/mnt/git/dctts/model.py", line 75, in forward
Z = torch.sigmoid(Z)
RuntimeError: CUDA out of memory. Tried to allocate 404.50 MiB (GPU 0; 15.75 GiB total capacity; 13.40 GiB already allocated; 225.88 MiB free; 1.13 GiB cached)
Hi, yhgon
In my case, it worked on GTX1080ti. Would you like to reduce mini-batch size? You can modify it (in config.py)
Thank you
@Yangyangii I config 1 or 2 batch but still have this issue. IMHO, it try to load multiple checkpoint files instead of latest one.
@yhgon
I cloned and re-tried to train with the original configuration. It worked on GTX1080. It needs about 8000 MB GPU memory. To train SSRN, the code doesn't load a checkpoint file. SSRN is an independent training progress. Could you give me more information? (e.g. nvidia-smi screenshot, modification from original code or log directory)
how long does it take for the text2mel training module to simulate? My run has been going on for around 10+ hours, it's still in progress.