StyleTTS2 icon indicating copy to clipboard operation
StyleTTS2 copied to clipboard

GPU memory error occurs at epoch 50 during phase first stage model training.

Open kadirnar opened this issue 9 months ago • 4 comments

I'm training a model with 8xH100. However, I'm getting a GPU memory error at epoch 50. How can I fix this? @yl4579

kadirnar avatar Mar 09 '25 06:03 kadirnar

Image

kadirnar avatar Mar 09 '25 06:03 kadirnar

@kadirnar did you find a solution for this?

iAdityaVishnu avatar Apr 17 '25 18:04 iAdityaVishnu

@kadirnar did you find a solution for this?

I don't remember. You can check this for my latest attempts:

https://github.com/Respaired/Tsukasa-Speech/issues/6

kadirnar avatar Apr 18 '25 00:04 kadirnar

1st stage epoch 50 is where TMA code kicks in, so you'll need to lower your batch size considerably here and continue with that batch until the end of stage 1 training. Same thing happens with epoch 20 in 2nd stage training - you'll need to about halve the batch size there and continue from last checkpoint when you get OOM.

As a side note, StyleTTS2 does not currently work on H100 hardware.

martinambrus avatar May 02 '25 17:05 martinambrus