DeepLearningExamples icon indicating copy to clipboard operation
DeepLearningExamples copied to clipboard

[BERT/TF2] The hyperparameters used for BERT Large pretraining from the cmd in the doc is not aligned with the config script

Open GHGmc2 opened this issue 2 years ago • 3 comments

Related to BERT/TensorFlow2

On the hyperparameters used for BERT Large pretraining, the cmd in the doc is not aligned the config in scripts/configs/pretrain_config.sh.

  • doc: https://github.com/NVIDIA/DeepLearningExamples/blob/master/TensorFlow2/LanguageModeling/BERT/README.md#pre-training The following sample code trains BERT-Large from scratch on a single DGX-2 using FP16 arithmetic. This will take around 4.5 days.
scripts/run_pretraining_lamb.sh <train_batch_size_phase1> <train_batch_size_phase2> <eval_batch_size> <learning_rate_phase1> <learning_rate_phase2> <precision> <use_xla> <num_gpus> <warmup_steps_phase1> <warmup_steps_phase2> <train_steps> <save_checkpoint_steps> <num_accumulation_phase1> <num_accumulation_steps_phase2> <bert_model>

scripts/run_pretraining_lamb.sh 60 10 8 7.5e-4 5e-4 fp16 true 8 2000 200 7820 100 64 192 large
  • config: https://github.com/NVIDIA/DeepLearningExamples/blob/master/TensorFlow2/LanguageModeling/BERT/scripts/configs/pretrain_config.sh
# Full LAMB pretraining configs for NVIDIA DGX-2H (16x NVIDIA V100 32GB GPU)

dgx2_16gpu_fp16 ()
{
  train_batch_size_phase1=60
  train_batch_size_phase2=10
  eval_batch_size=8
  learning_rate_phase1="3.75e-4"
  learning_rate_phase2="2.5e-4"
  precision="fp16"
  use_xla="true"
  num_gpus=16
  warmup_steps_phase1=2133
  warmup_steps_phase2=213
  train_steps=8341
  save_checkpoints_steps=100
  num_accumulation_steps_phase1=64
  num_accumulation_steps_phase2=192
  echo $train_batch_size_phase1 $train_batch_size_phase2 $eval_batch_size $learning_rate_phase1 $learning_rate_phase2 $precision $use_xla $num_gpus $warmup_steps_phase1 $warmup_steps_phase2 $train_steps $save_checkpoint_steps $num_accumulation_steps_phase2
}

# Full LAMB pretraining configs for NVIDIA DGX-1 (8x NVIDIA V100 32GB GPU)

dgx1_8gpu_fp16 ()
{
  train_batch_size_phase1=60
  train_batch_size_phase2=10
  eval_batch_size=8
  learning_rate_phase1="7.5e-4"
  learning_rate_phase2="5e-4"
  precision="fp16"
  use_xla="true"
  num_gpus=8
  warmup_steps_phase1=2133
  warmup_steps_phase2=213
  train_steps=8341
  save_checkpoints_steps=100
  num_accumulation_steps_phase1=128
  num_accumulation_steps_phase2=384
  echo $train_batch_size_phase1 $train_batch_size_phase2 $eval_batch_size $learning_rate_phase1 $learning_rate_phase2 $precision $use_xla $num_gpus $warmup_steps_phase1 $warmup_steps_phase2 $train_steps $save_checkpoint_steps $num_accumulation_steps_phase2
}

Which one should I follow to run BERT large pretraining from scratch?

GHGmc2 avatar May 09 '23 02:05 GHGmc2

The config file is the correct hyperparameters for BERT Large pretraining.

meatybobby avatar May 09 '23 17:05 meatybobby

@meatybobby Thanks for the confirmation, any plan to update docs?

BTW, seems the config of A100 is for 80GB, not 40GB, can you help to confirm too?

GHGmc2 avatar May 10 '23 01:05 GHGmc2

Yes, we will update README later. And A100 config should be 80GB, we will update this as well. Thank you for correcting.

meatybobby avatar May 10 '23 05:05 meatybobby