VoiceCraft icon indicating copy to clipboard operation
VoiceCraft copied to clipboard

CUDA out of memory

Open WoBuChiTang opened this issue 1 year ago • 1 comments

hello, Im training on part xl of gigaspeech using 4*A100 80G, still report cuda out of memory. how to fix it?

WoBuChiTang avatar Apr 10 '24 08:04 WoBuChiTang

decrease max_num_tokens or increase gradient_accumulation_steps

jasonppy avatar Apr 10 '24 16:04 jasonppy

Hi - to apply the above is this batch_size and max_len ? Noting, I was unable to fix OOM on small GPUs using the prescribed methods: export 'PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512' (fill in a value for the mb), or export CUDA_VISIBLE_DEVICES=0,1,2,3 (with chosen device IDs). I can confirm altering the batch_size and max_len has allowed my initial training to proceed.

truedat101 avatar Apr 11 '24 20:04 truedat101

Hi - to apply the above is this batch_size and max_len ? Noting, I was unable to fix OOM on small GPUs using the prescribed methods: export 'PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512' (fill in a value for the mb), or export CUDA_VISIBLE_DEVICES=0,1,2,3 (with chosen device IDs). I can confirm altering the batch_size and max_len has allowed my initial training to proceed.

restrict audio length less than 16s, finally fix this problem

WoBuChiTang avatar Apr 15 '24 02:04 WoBuChiTang

@WoBuChiTang hi, i need to train this model on a long audio dataset (up to 20 second long), curious what's the max max_num_tokens you were able to pull off with your long dataset using 4*A100 80G.

thivux avatar Jun 16 '24 08:06 thivux

@WoBuChiTang hi, i need to train this model on a long audio dataset (up to 20 second long), curious what's the max max_num_tokens you were able to pull off with your long dataset using 4*A100 80G.

max_num_tokens: 35000, because the later the more memory use. for you question, i recommend 30000 or 20000

WoBuChiTang avatar Jun 20 '24 07:06 WoBuChiTang