keras-nlp icon indicating copy to clipboard operation
keras-nlp copied to clipboard

[Training] Track the progress of BERT Training

Open chenmoneygithub opened this issue 2 years ago • 3 comments

This is an issue for tracking the progress of training BERT example. The model has different sizes: tiny, small, base and large. Only tiny and small fit in a common GPU on GCP. For base and large, we have to move them to TPU for training, or use ParameterServerStrategy.

chenmoneygithub avatar May 09 '22 22:05 chenmoneygithub

Training config:

MODEL NAME NUM LAYERS(L) HIDDEN SIZE(H) NUM HEADS(A) BATCH_SIZE NUM TRAIN STEPS
BERT SMALL 4 512 8 256 50000

Training Stats:

loss lm_loss nsp_loss lm_accuracy nsp_accuracy
3.2712 3.1801 0.0911 0.3717 0.9648

GLUE evaluation:

mrpc cola
71.13 0

chenmoneygithub avatar May 10 '22 19:05 chenmoneygithub

Training config:

MODEL NAME NUM LAYERS(L) HIDDEN SIZE(H) NUM HEADS(A) BATCH_SIZE NUM TRAIN STEPS
BERT SMALL 4 512 8 256 500000

GLUE evaluation:

mrpc
73.10

Platform: GCE + GPU

chenmoneygithub avatar May 17 '22 22:05 chenmoneygithub

We fixed some initialization issues, and reran the experiment, it gets closer to the official reported score.

Training config:

MODEL NAME NUM LAYERS(L) HIDDEN SIZE(H) NUM HEADS(A) BATCH_SIZE NUM TRAIN STEPS
BERT SMALL 4 512 8 256 500000

GLUE evaluation:

mrpc
75.01

Platform: Cloud TPU

chenmoneygithub avatar May 21 '22 22:05 chenmoneygithub