BalancingGroups Weight decay for BERT models

Weight decay for BERT models

Open izmailovpavel opened this issue 2 years ago • 0 comments

Hi! I noticed that in your code for BERT AdamW optimizer you only apply weight decay to parameters that contain the strings bias or LayerNorm.weight:

https://github.com/facebookresearch/BalancingGroups/blob/72d31e56e168b8ab03348810d4c5bac0f8a90a7a/models.py#L41-L45

The original group DRO code seems to do the opposite and not apply weight decay to only those parameters:

https://github.com/kohpangwei/group_DRO/blob/master/train.py#L111-L114

Aug 01 '22 17:08 izmailovpavel

BalancingGroups BalancingGroups copied to clipboard

Weight decay for BERT models

BalancingGroups
BalancingGroups copied to clipboard