gpt-neox icon indicating copy to clipboard operation
gpt-neox copied to clipboard

Cuda OOM with 20B model

Open gaarutyunov opened this issue 2 years ago • 0 comments

I am trying to finetune 20B model with APPS dataset with slim weights. The config is identical to the one you provided in the repository with some tweaks (listing them below). But i am constantly getting OOM error.

Changes to the configuration:

  • gradient_accumulation_steps: tried different values [1-32]
  • train_micro_batch_size_per_gpu: same as gradient_accumulation_steps
  • zero_optimization: Only stage 1 works. CPU offload doesn't. Tried changing "reduce_bucket_size" parameter and others accordingly.
  • pipe-parallel-size and model-parallel-size: 1x2, 2x2, 4x2. Tried different combinations depending on the number of gpus available.

Setups I tried:

The only way it worked was with 8 x NVIDIA A100 80 ГБ SXM. Sadly it failed because of another mistake in configuration (doesn't matter). The thing is that now I have to wait for days or weeks to run the finetuning process again. I am using my university cluster that has only 6 nodes with such configuration that are always occupied.

Could you please comment on how to finetune the model properly with 2 x NVIDIA Tesla V100 32 ГБ NVLink or 2 x NVIDIA A100 80 ГБ SXM? What should be the configuration? Is it even possible?

gaarutyunov avatar May 04 '22 16:05 gaarutyunov