composer
composer copied to clipboard
Update checks for Gated Linear Units Method
GLU applies surgery to any model that has architecture modules BertIntermediate and BertOutput. Not all models have these modules; for example, DistilBert from HuggingFace has different names for the modules.
There are 2 issues that are addressed in this PR:
- The GLU algorithm should not have a strict check for our Composer
HuggingFaceModelwrapper - As far as I can tell, Hugging Face models that have modules
BertIntermediateandBertOutputmodules areBertModelandBertPreTrainedModel.
Confusingly, BertModel is a subclass of BertPreTrainedModel.
Autoclasses allow users to automatically retrieve the relevant model given the name/path, and models instantiated via
model = transformers.AutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
are actually subclasses of BertPreTrainedModel and not BertModel, while
model = transformers.AutoModel.from_pretrained('bert-base-uncased', num_labels=2)
is a subclass of BertModel.
Additionally, BertForSequenceClassification and BertForMaskedLM are subclasses of BertPreTrainedModel (not BertModel!).
I've therefore changed the code to the following:
# ensure that the model is an instance of a Hugging Face BertPreTrainedModel, since our replacement policy is only defined for BERTs
if not isinstance(unwrapped_model, BertPreTrainedModel):
raise TypeError('Gated Linear Units only has a surgery policy defined for instances of the transformers.BertPreTrainedModel')
For a quick check, see this google colab
- [x] ~I still need to do a full BERT run with GLU to confirm that the changes below don't affect our standard BERT+GLU runs~
Here is a run in w&b that uses GLU from this branch: https://wandb.ai/mosaic-ml/glu-test-2022-10-04/workspace?workspace=user-jportes and the corresponding yaml https://github.com/jacobfulano/composer/blob/glu-hotfix-2022-10-04/experiments/test-glu.yaml
@nik-mosaic should we do a similar change for algorithms in performance repo?
@nik-mosaic should we do a similar change for algorithms in performance repo?
Yes, it will be good for all our algorithms to accurately specify the class they apply to.
@dskhudia @jacobfulano is this still relevant? if so, can we merge? CC: @nik-mosaic if we do want to do these changes in performance
@mvpatel2000 this is relevant and the branch is updated. Please feel free to approve!