keras-nlp
keras-nlp copied to clipboard
[Training] Track the progress of BERT Training
This is an issue for tracking the progress of training BERT example. The model has different sizes: tiny, small, base and large. Only tiny and small fit in a common GPU on GCP. For base and large, we have to move them to TPU for training, or use ParameterServerStrategy.
Training config:
| MODEL NAME | NUM LAYERS(L) | HIDDEN SIZE(H) | NUM HEADS(A) | BATCH_SIZE | NUM TRAIN STEPS |
|---|---|---|---|---|---|
| BERT SMALL | 4 | 512 | 8 | 256 | 50000 |
Training Stats:
| loss | lm_loss | nsp_loss | lm_accuracy | nsp_accuracy |
|---|---|---|---|---|
| 3.2712 | 3.1801 | 0.0911 | 0.3717 | 0.9648 |
GLUE evaluation:
| mrpc | cola |
|---|---|
| 71.13 | 0 |
Training config:
| MODEL NAME | NUM LAYERS(L) | HIDDEN SIZE(H) | NUM HEADS(A) | BATCH_SIZE | NUM TRAIN STEPS |
|---|---|---|---|---|---|
| BERT SMALL | 4 | 512 | 8 | 256 | 500000 |
GLUE evaluation:
| mrpc |
|---|
| 73.10 |
Platform: GCE + GPU
We fixed some initialization issues, and reran the experiment, it gets closer to the official reported score.
Training config:
| MODEL NAME | NUM LAYERS(L) | HIDDEN SIZE(H) | NUM HEADS(A) | BATCH_SIZE | NUM TRAIN STEPS |
|---|---|---|---|---|---|
| BERT SMALL | 4 | 512 | 8 | 256 | 500000 |
GLUE evaluation:
| mrpc |
|---|
| 75.01 |
Platform: Cloud TPU