CUDA out of memory
hello, Im training on part xl of gigaspeech using 4*A100 80G, still report cuda out of memory. how to fix it?
decrease max_num_tokens or increase gradient_accumulation_steps
Hi - to apply the above is this batch_size and max_len ? Noting, I was unable to fix OOM on small GPUs using the prescribed methods: export 'PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512' (fill in a value for the mb), or export CUDA_VISIBLE_DEVICES=0,1,2,3 (with chosen device IDs). I can confirm altering the batch_size and max_len has allowed my initial training to proceed.
Hi - to apply the above is this batch_size and max_len ? Noting, I was unable to fix OOM on small GPUs using the prescribed methods: export 'PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512' (fill in a value for the mb), or export CUDA_VISIBLE_DEVICES=0,1,2,3 (with chosen device IDs). I can confirm altering the batch_size and max_len has allowed my initial training to proceed.
restrict audio length less than 16s, finally fix this problem
@WoBuChiTang hi, i need to train this model on a long audio dataset (up to 20 second long), curious what's the max max_num_tokens you were able to pull off with your long dataset using 4*A100 80G.
@WoBuChiTang hi, i need to train this model on a long audio dataset (up to 20 second long), curious what's the max
max_num_tokensyou were able to pull off with your long dataset using 4*A100 80G.
max_num_tokens: 35000, because the later the more memory use. for you question, i recommend 30000 or 20000