composer Update checks for Gated Linear Units Method

GLU applies surgery to any model that has architecture modules BertIntermediate and BertOutput. Not all models have these modules; for example, DistilBert from HuggingFace has different names for the modules.

There are 2 issues that are addressed in this PR:

The GLU algorithm should not have a strict check for our Composer HuggingFaceModel wrapper
As far as I can tell, Hugging Face models that have modules BertIntermediate and BertOutput modules are BertModel and BertPreTrainedModel.

Confusingly, BertModel is a subclass of BertPreTrainedModel.

Autoclasses allow users to automatically retrieve the relevant model given the name/path, and models instantiated via

model = transformers.AutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

are actually subclasses of BertPreTrainedModel and not BertModel, while

model = transformers.AutoModel.from_pretrained('bert-base-uncased', num_labels=2)

is a subclass of BertModel.

Additionally, BertForSequenceClassification and BertForMaskedLM are subclasses of BertPreTrainedModel (not BertModel!).

I've therefore changed the code to the following:

# ensure that the model is an instance of a Hugging Face BertPreTrainedModel, since our replacement policy is only defined for BERTs
if not isinstance(unwrapped_model, BertPreTrainedModel):
    raise TypeError('Gated Linear Units only has a surgery policy defined for instances of the transformers.BertPreTrainedModel')

For a quick check, see this google colab

[x] ~I still need to do a full BERT run with GLU to confirm that the changes below don't affect our standard BERT+GLU runs~

Here is a run in w&b that uses GLU from this branch: https://wandb.ai/mosaic-ml/glu-test-2022-10-04/workspace?workspace=user-jportes and the corresponding yaml https://github.com/jacobfulano/composer/blob/glu-hotfix-2022-10-04/experiments/test-glu.yaml

Sep 30 '22 04:09 jacobfulano

@nik-mosaic should we do a similar change for algorithms in performance repo?

Sep 30 '22 05:09 dskhudia

@nik-mosaic should we do a similar change for algorithms in performance repo?

Yes, it will be good for all our algorithms to accurately specify the class they apply to.

Sep 30 '22 20:09 nik-mosaic

@dskhudia @jacobfulano is this still relevant? if so, can we merge? CC: @nik-mosaic if we do want to do these changes in performance

Oct 20 '22 20:10 mvpatel2000

@mvpatel2000 this is relevant and the branch is updated. Please feel free to approve!

Oct 26 '22 18:10 jacobfulano

composer composer copied to clipboard

Update checks for Gated Linear Units Method

composer
composer copied to clipboard