DeepConvolutionalTTS-pytorch icon indicating copy to clipboard operation
DeepConvolutionalTTS-pytorch copied to clipboard

OOM error in SSRN

Open yhgon opened this issue 5 years ago • 4 comments

I'm reproduce the result.

after I finish Text2Mel training,

I try to train SSRN. however, hit Out of Memory.

https://github.com/Yangyangii/DeepConvolutionalTTS-pytorch/blob/master/train.py#L195 https://github.com/Yangyangii/DeepConvolutionalTTS-pytorch/blob/master/model.py#L75

Do you have any idea?

Traceback (most recent call last):                                    
  File "train.py", line 223, in <module>
    main(network=network)
  File "train.py", line 206, in main
    batch_size=args.batch_size, ckpt_dir=ckpt_dir, writer=writer)
  File "train.py", line 53, in train
    mags_hat = model(mels)  # mags_hat: (N, Ty, n_mags)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/git/dctts/model.py", line 75, in forward
    Z = torch.sigmoid(Z)
RuntimeError: CUDA out of memory. Tried to allocate 404.50 MiB (GPU 0; 15.75 GiB total capacity; 13.40 GiB already allocated; 225.88 MiB free; 1.13 GiB cached)

yhgon avatar May 03 '19 07:05 yhgon

Hi, yhgon

In my case, it worked on GTX1080ti. Would you like to reduce mini-batch size? You can modify it (in config.py)

Thank you

Yangyangii avatar May 04 '19 17:05 Yangyangii

@Yangyangii I config 1 or 2 batch but still have this issue. IMHO, it try to load multiple checkpoint files instead of latest one.

yhgon avatar May 08 '19 09:05 yhgon

@yhgon

I cloned and re-tried to train with the original configuration. It worked on GTX1080. It needs about 8000 MB GPU memory. To train SSRN, the code doesn't load a checkpoint file. SSRN is an independent training progress. Could you give me more information? (e.g. nvidia-smi screenshot, modification from original code or log directory)

Yangyangii avatar May 13 '19 09:05 Yangyangii

how long does it take for the text2mel training module to simulate? My run has been going on for around 10+ hours, it's still in progress.

somasundaram97 avatar Feb 25 '21 18:02 somasundaram97