fast-bert icon indicating copy to clipboard operation
fast-bert copied to clipboard

Does the multi-label classifier optimize the entire model or it freezes BERT?

Open fabrahman opened this issue 5 years ago • 5 comments

I wonder if the default code optimizes the entire model end-to-end or just the additional classifier layer parameters get updated?

Thanks

fabrahman avatar Mar 26 '20 21:03 fabrahman

I noticed the freeze/requires_grad logic is commented in learner_cls. @kaushaltrivedi how does one freeze the bert layers and only train the added custom layer?

aaronbriel avatar Apr 03 '20 13:04 aaronbriel

I tried adding a freeze_transformers_layer conditional in BertLearner's init function that set requires_grad to false for any param with the model_type in named_parameters but it didn't seem to have any effect. This approach worked for me in another implementation, resulting in exponentially reduced training times.

aaronbriel avatar Apr 03 '20 15:04 aaronbriel

https://github.com/kaushaltrivedi/fast-bert/pull/195

aaronbriel avatar Apr 14 '20 16:04 aaronbriel

just for clarification, freezing the layers means that there is a single linear layer (i.e. classifier head?) being trained for classification rather than all the layers being used to update the weights (of the input text) during training?

lingdoc avatar Apr 15 '20 03:04 lingdoc

That is correct. Also note that I did indeed confirm this by looping through and printing all layer names along with their requires_grad setting (summarized for brevity):

bert.embeddings.word_embeddings.weight, requires_grad:False bert.embeddings.position_embeddings.weight, requires_grad:False bert.embeddings.token_type_embeddings.weight, requires_grad:False ... bert.encoder.layer.11.output.LayerNorm.bias, requires_grad:False bert.pooler.dense.weight, requires_grad:False bert.pooler.dense.bias, requires_grad:False classifier.weight, requires_grad:True classifier.bias, requires_grad:True

aaronbriel avatar Apr 15 '20 14:04 aaronbriel