DeepLearningExamples
DeepLearningExamples copied to clipboard
[BERT/TF2] The hyperparameters used for BERT Large pretraining from the cmd in the doc is not aligned with the config script
Related to BERT/TensorFlow2
On the hyperparameters used for BERT Large pretraining, the cmd in the doc is not aligned the config in scripts/configs/pretrain_config.sh.
- doc: https://github.com/NVIDIA/DeepLearningExamples/blob/master/TensorFlow2/LanguageModeling/BERT/README.md#pre-training The following sample code trains BERT-Large from scratch on a single DGX-2 using FP16 arithmetic. This will take around 4.5 days.
scripts/run_pretraining_lamb.sh <train_batch_size_phase1> <train_batch_size_phase2> <eval_batch_size> <learning_rate_phase1> <learning_rate_phase2> <precision> <use_xla> <num_gpus> <warmup_steps_phase1> <warmup_steps_phase2> <train_steps> <save_checkpoint_steps> <num_accumulation_phase1> <num_accumulation_steps_phase2> <bert_model>
scripts/run_pretraining_lamb.sh 60 10 8 7.5e-4 5e-4 fp16 true 8 2000 200 7820 100 64 192 large
- config: https://github.com/NVIDIA/DeepLearningExamples/blob/master/TensorFlow2/LanguageModeling/BERT/scripts/configs/pretrain_config.sh
# Full LAMB pretraining configs for NVIDIA DGX-2H (16x NVIDIA V100 32GB GPU)
dgx2_16gpu_fp16 ()
{
train_batch_size_phase1=60
train_batch_size_phase2=10
eval_batch_size=8
learning_rate_phase1="3.75e-4"
learning_rate_phase2="2.5e-4"
precision="fp16"
use_xla="true"
num_gpus=16
warmup_steps_phase1=2133
warmup_steps_phase2=213
train_steps=8341
save_checkpoints_steps=100
num_accumulation_steps_phase1=64
num_accumulation_steps_phase2=192
echo $train_batch_size_phase1 $train_batch_size_phase2 $eval_batch_size $learning_rate_phase1 $learning_rate_phase2 $precision $use_xla $num_gpus $warmup_steps_phase1 $warmup_steps_phase2 $train_steps $save_checkpoint_steps $num_accumulation_steps_phase2
}
# Full LAMB pretraining configs for NVIDIA DGX-1 (8x NVIDIA V100 32GB GPU)
dgx1_8gpu_fp16 ()
{
train_batch_size_phase1=60
train_batch_size_phase2=10
eval_batch_size=8
learning_rate_phase1="7.5e-4"
learning_rate_phase2="5e-4"
precision="fp16"
use_xla="true"
num_gpus=8
warmup_steps_phase1=2133
warmup_steps_phase2=213
train_steps=8341
save_checkpoints_steps=100
num_accumulation_steps_phase1=128
num_accumulation_steps_phase2=384
echo $train_batch_size_phase1 $train_batch_size_phase2 $eval_batch_size $learning_rate_phase1 $learning_rate_phase2 $precision $use_xla $num_gpus $warmup_steps_phase1 $warmup_steps_phase2 $train_steps $save_checkpoint_steps $num_accumulation_steps_phase2
}
Which one should I follow to run BERT large pretraining from scratch?
The config file is the correct hyperparameters for BERT Large pretraining.
@meatybobby Thanks for the confirmation, any plan to update docs?
BTW, seems the config of A100 is for 80GB, not 40GB, can you help to confirm too?
Yes, we will update README later. And A100 config should be 80GB, we will update this as well. Thank you for correcting.