adapters
adapters copied to clipboard
Not getting any speedup when using adapters for DistilBert over regular fine-tuning
Hello,
I'm comparing the speedup & accuracy tradeoff I can get on a Distilbert
model. During my trials, I seem to not have noticed any speedup using any of the adapters in this library (in terms of seconds per epoch), while getting some performance hits.
I load an adapter model and a regular Distilbert
model like so:
def get_regular_model(checkpoint: str):
return AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)
def get_adapter_model(checkpoint: str):
from transformers.adapters import MAMConfig
model = AutoAdapterModel.from_pretrained(checkpoint)
model.add_classification_head('classifier', num_labels=2)
config = MAMConfig(PrefixTuningConfig(flat=False, prefix_length=10),
AdapterConfig(mh_adapter=True, output_adapter=True, reduction_factor=16, non_linearity="relu", is_parallel=True))
model.add_adapter("mam_adapter", config=config)
model.train_adapter('mam_adapter')
return model
I then run these models through the same training epoch, defined as:
def train_epoch(model, train_dataloader, opt):
model.train()
pbar = tqdm(range(len(train_dataloader)))
for batch in train_dataloader:
batch = {k: v.to(device) for k, v in batch.items()}
outputs = model(**batch)
loss = outputs.loss
opt.zero_grad()
loss.backward()
opt.step()
pbar.update(1)
Is there anything I am doing wrong / am missing?
Hey, the code snippets you show look good to me. Did you get similar results when using model architectures other than DistilBert and other adapter architectures? The Mix-and-Match configuration you're using adds relatively large adapter modules with many parameters. Training speedup should be better when using smaller adapter modules (e.g. bottleneck-only adapters such as PfeifferConfig()
).
This issue has been automatically marked as stale because it has been without activity for 90 days. This issue will be closed in 14 days unless you comment or remove the stale label.
This issue was closed because it was stale for 14 days without any activity.