adapters
adapters copied to clipboard
Training an Adapter using own classification head and pytorch training loop
Details
Hello ! I want to add adapter approach in my text-classification pre-trained bert, but I did not find a good explanation in the documentation on how to that. My model class is the following:
class BertClassifier(nn.Module):
"""Bert Model for Classification Tasks."""
def __init__(self, freeze_bert=True):
"""
@param bert: a BertModel object
@param classifier: a torch.nn.Module classifier
@param freeze_bert (bool): Set `False` to fine-tune the BERT model
"""
super(BertClassifier, self).__init__()
# Instantiate BERT model
# Specify hidden size of BERT, hidden size of our classifier, and number of labels
self.bert = BertAdapterModel.from_pretrained(PREETRAINED_MODEL')
self.D_in = 1024
self.H = 512
self.D_out = 2
# Add a new adapter
self.bert.add_adapter("thermo_cl",set_active=True)
self.bert.train_adapter(["thermo_cl"])
# Instantiate the classifier head with some one-layer feed-forward classifier
self.classifier = nn.Sequential(
nn.Linear(self.D_in, 512),
nn.Tanh(),
nn.Linear(512, self.D_out),
nn.Tanh()
)
# Freeze the BERT model
if freeze_bert:
for param in self.bert.parameters():
param.requires_grad = True
def forward(self, input_ids, attention_mask):
''' Feed input to BERT and the classifier to compute logits.
@param input_ids (torch.Tensor): an input tensor with shape (batch_size,
max_length)
@param attention_mask (torch.Tensor): a tensor that hold attention mask
information with shape (batch_size, max_length)
@return logits (torch.Tensor): an output tensor with shape (batch_size,
num_labels) '''
# Feed input to BERT
outputs = self.bert(input_ids=input_ids,
attention_mask=attention_mask)
# Extract the last hidden state of the token `[CLS]` for classification task
last_hidden_state_cls = outputs[0][:, 0, :]
# Feed input to classifier to compute logits
logits = self.classifier(last_hidden_state_cls)
return logits
The training loop is the following:
def initialize_model(epochs):
""" Initialize the Bert Classifier, the optimizer and the learning rate scheduler."""
# Instantiate Bert Classifier
bert_classifier = BertClassifier(freeze_bert=False) #false=freezed
# Tell PyTorch to run the model on GPU
bert_classifier = bert_classifier.to(device)
# Create the optimizer
optimizer = AdamW(bert_classifier.parameters(),
lr=lr, # Default learning rate
eps=1e-8 # Default epsilon value
)
# Total number of training steps
total_steps = len(train_dataloader) * epochs
# Set up the learning rate scheduler
scheduler = get_linear_schedule_with_warmup(optimizer,
num_warmup_steps=0, # Default value
num_training_steps=total_steps)
return bert_classifier, optimizer, scheduler
def train(model, train_dataloader, val_dataloader, valid_loss_min_input, checkpoint_path, best_model_path, start_epochs, epochs, evaluation=True):
"""Train the BertClassifier model."""
# Start training loop
logging.info("--Start training...\n")
# Initialize tracker for minimum validation loss
valid_loss_min = valid_loss_min_input
for epoch_i in range(start_epochs, epochs):
..............................
if evaluation == True:
# After the completion of each training epoch, measure the model's performance
# on our validation set.
val_loss, val_accuracy = evaluate(model, val_dataloader)
# Print performance over the entire training data
time_elapsed = time.time() - t0_epoch
logging.info(f"{epoch_i + 1:^7} | {'-':^7} | {avg_train_loss:^12.6f} | {val_loss:^10.6f} | {val_accuracy:^10.6f} | {time_elapsed:^9.2f}")
logging.info("-"*70)
logging.info("\n")
# create checkpoint variable and add important data
checkpoint = {
'epoch': epoch_i + 1,
'valid_loss_min': val_loss,
'state_dict': model.state_dict(),
'optimizer': optimizer.state_dict(),
}
# save checkpoint
save_ckp(checkpoint, False, checkpoint_path, best_model_path)
## TODO: save the model if validation loss has decreased
if val_loss <= valid_loss_min:
print('Validation loss decreased ({:.6f} --> {:.6f}). Saving model ...'.format(valid_loss_min,val_loss))
# save checkpoint as best model
save_ckp(checkpoint, True, checkpoint_path, best_model_path)
valid_loss_min = val_loss
model.save_adapter("./final_adapter", "thermo_cl")
logging.info("-----------------Training complete--------------------------")
bert_classifier, optimizer, scheduler = initialize_model(epochs=n_epochs)
train(model = bert_classifier....)
As you can see I have my own personalized classification head, so I don't want to use the .add_classification_head() method. Is it correct to train and activate the adapter in this way? I would like to know if I'm using adapter properly and also how to save the checkpoint and my model weights because at the end of the training (where i suppose to save the adapter) I receive this error:
AttributeError: 'BertClassifier' object has no attribute 'save_adapter'
Thanks for the help!
Hey @Ch-rode,
your adapter adding and activation code looks right. For saving the adapter, you should call model.bert.save_adapter("./final_adapter", "thermo_cl")
since your model is an instance of your custom class and does not have a save_adapter()
method.
Thanks for the answer @calpt SO, to be sure, I don't need to activate a head keeping the same name of the adapter, right? Because in this tutorial it says something like that. Another question, Have I to save the adapter after all the training right? not as a checkpoint. Like the following: Thanks for the help!
def train(model, train_dataloader, val_dataloader, valid_loss_min_input, checkpoint_path, best_model_path, start_epochs, epochs, evaluation=True):
# Start training loop
logging.info("--Start training...\n")
# Initialize tracker for minimum validation loss
valid_loss_min = valid_loss_min_input
for epoch_i in range(start_epochs, epochs):
# =======================================
# Training
# =======================================
# Print the header of the result table
logging.info((f"{'Epoch':^7} | {'Batch':^7} | {'Train Loss':^12} | {'Val Loss':^10} | {'Val Acc':^9} | {'Elapsed':^9}"))
# Measure the elapsed time of each epoch
t0_epoch, t0_batch = time.time(), time.time()
# Reset tracking variables at the beginning of each epoch
total_loss, batch_loss, batch_counts = 0, 0, 0
# Put the model into the training mode
model.train()
# For each batch of training data...
for step, batch in enumerate(train_dataloader):
batch_counts +=1
# Load batch to GPU
b_input_ids, b_attn_mask, b_labels = tuple(t.to(device) for t in batch)
# Zero out any previously calculated gradients
model.zero_grad()
# Perform a forward pass. This will return logits.
logits = model(b_input_ids, b_attn_mask)
# Compute loss and accumulate the loss values
loss = loss_fn(logits, b_labels)
batch_loss += loss.item()
total_loss += loss.item()
# Perform a backward pass to calculate gradients
loss.backward()
# Clip the norm of the gradients to 1.0 to prevent "exploding gradients"
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
# Update parameters and the learning rate
optimizer.step()
scheduler.step()
# Print the loss values and time elapsed for every 20 batches
if (step % 500 == 0 and step != 0) or (step == len(train_dataloader) - 1):
# Calculate time elapsed for 20 batches
time_elapsed = time.time() - t0_batch
# Print training results
logging.info(f"{epoch_i + 1:^7} | {step:^7} | {batch_loss / batch_counts:^12.6f} | {'-':^10} | {'-':^9} | {time_elapsed:^9.2f}")
# Reset batch tracking variables
batch_loss, batch_counts = 0, 0
t0_batch = time.time()
# Calculate the average loss over the entire training data
avg_train_loss = total_loss / len(train_dataloader)
logging.info("-"*70)
# =======================================
# Evaluation
# =======================================
if evaluation == True:
# After the completion of each training epoch, measure the model's performance
# on our validation set.
val_loss, val_accuracy = evaluate(model, val_dataloader)
# Print performance over the entire training data
time_elapsed = time.time() - t0_epoch
logging.info(f"{epoch_i + 1:^7} | {'-':^7} | {avg_train_loss:^12.6f} | {val_loss:^10.6f} | {val_accuracy:^10.6f} | {time_elapsed:^9.2f}")
logging.info("-"*70)
logging.info("\n")
# create checkpoint variable and add important data
checkpoint = {
'epoch': epoch_i + 1,
'valid_loss_min': val_loss,
'state_dict': model.state_dict(),
'optimizer': optimizer.state_dict(),
}
# save checkpoint
save_ckp(checkpoint, False, checkpoint_path, best_model_path)
## TODO: save the model if validation loss has decreased
if val_loss <= valid_loss_min:
print('Validation loss decreased ({:.6f} --> {:.6f}). Saving model ...'.format(valid_loss_min,val_loss))
# save checkpoint as best model
save_ckp(checkpoint, True, checkpoint_path, best_model_path)
valid_loss_min = val_loss
model.bert.save_adapter("./final_adapter", "thermo_cl")
logging.info("-----------------Training complete--------------------------")
To resume the model I'm doing something like that, is it correct?
bert_classifier, optimizer, scheduler = initialize_model(epochs=n_epochs)
model, optimizer, start_epoch, valid_loss_min = load_ckp(r"./best_model/best_model.pt", bert_classifier, optimizer)
model.load_adapter("./final_adapter", model_name=model)
model.set_active_adapters("thermo_cl")
SO, to be sure, I don't need to activate a head keeping the same name of the adapter, right? Because in this tutorial it says something like that.
You don't have to do this if you're using a custom prediction head on top of the model (as you are doing). The BertAdapterModel
provides built-in head implementations for common tasks (such as classification) which you can use if they fit your use case. The comment about keeping a head with the same name as the adapter enables automatic loading and saving of the head together with the adapter. As you have your own prediction head, you can just ignore everything about the built-in prediction heads.
To resume the model I'm doing something like that, is it correct?
Yes, your code for saving and loading the adapter looks good to me. However, you don't have to pass model_name
to load_adapter()
as long as you're loading the adapter from the local file system.
Thanks a lot, it works ! However I am facing trouble when I have to re-load the model (like for testing or inference). I save the model using
# saving the model in hugginface format
model.save_pretrained('./best_model_hugginface/model_hugginface')
if adapter == 'True':
#save only the adapter separately
model.bert.save_adapter('./best_model_hugginface/final_adapter','adapter_v1')
So when I load it using BertClassifier.from_pretrained(''./best_model_hugginface/model_hugginface') without loading the adapter (because it is suppose to be inside the all model weights) and I check the structure, i can see it BUT i have this warning There are adapters available but none are activated for the forward pass.
This is an extract from the structure:
(adapters): ModuleDict(
(adapter_v1): Adapter(
(non_linearity): Activation_Function_Class(
(f): ReLU()
)
(adapter_down): Sequential(
(0): Linear(in_features=1024, out_features=64, bias=True)
(1): Activation_Function_Class(
(f): ReLU()
This is my updated BertClassifier:
class BertClassifierConfig(PretrainedConfig):
......
class BertClassifier(PreTrainedModel):
"""Bert Model for Classification Tasks."""
config_class = BertClassifierConfig
def __init__(self, freeze_bert=True):
"""
@param bert: a BertModel object
@param classifier: a torch.nn.Module classifier
@param freeze_bert (bool): Set `False` to fine-tune the BERT model
"""
super(BertClassifier, self).__init__()
# Instantiate BERT model
# Specify hidden size of BERT, hidden size of our classifier, and number of labels
self.bert = BertAdapterModel.from_pretrained(PREETRAINED_MODEL')
self.D_in = 1024
self.H = 512
self.D_out = 2
# Add a new adapter
self.bert.add_adapter("thermo_cl",set_active=True)
self.bert.train_adapter(["thermo_cl"])
# Instantiate the classifier head with some one-layer feed-forward classifier
self.classifier = nn.Sequential(
nn.Linear(self.D_in, 512),
nn.Tanh(),
nn.Linear(512, self.D_out),
nn.Tanh()
)
# Freeze the BERT model
if freeze_bert:
for param in self.bert.parameters():
param.requires_grad = True
def forward(self, input_ids, attention_mask):
''' Feed input to BERT and the classifier to compute logits.
@param input_ids (torch.Tensor): an input tensor with shape (batch_size,
max_length)
@param attention_mask (torch.Tensor): a tensor that hold attention mask
information with shape (batch_size, max_length)
@return logits (torch.Tensor): an output tensor with shape (batch_size,
num_labels) '''
# Feed input to BERT
outputs = self.bert(input_ids=input_ids,
attention_mask=attention_mask)
# Extract the last hidden state of the token `[CLS]` for classification task
last_hidden_state_cls = outputs[0][:, 0, :]
# Feed input to classifier to compute logits
logits = self.classifier(last_hidden_state_cls)
return logits
I had used the PretrainedConfig and PreTrainedModel from hugginface.
Thanks a lot !
also if i print out print I have:
bert_classifier = BertClassifier.from_pretrained('bestmodel_hugginface/model_hugginface/') print(bert_classifier.config.adapters.adapters)
I can see it:
{'adapter_v1': 'pfeiffer'}
#337 Here I can see your comment
Also note that when you call model.save_pretrained() on a model with adapters, it will save the full model along with the adapters (in the same file). Thus, you don't need to save adapters separately in this case.
So my question is how this warning There are adapters available but none are activated for the forward pass
must be interpreted?
Thanks a lot !
Hey, sorry for taking so long to answer. This warning: There are adapters available but none are activated for the forward pass
usually means that haven't activated any adapters to be used in a forward pass. While the adapters are re-loaded together with the model automatically, they still have to be activated again to be used: model.set_active_adapters("adapter_v1")
. The currently active adapters can be printed using model.active_adapters
.
Thanks for the reply. If I use model.set_active_adapters("adapter_v1")
I receive Overwriting existing adapter 'adapter_v1'.
That's interesting, this warning should not possibly occur when calling set_active_adapters()
as this method is not loading any adapter weights. Usually, this happens when you load weights from an adapter that is already added to the model. E.g., when an adapter is loaded together with the model using from_pretrained()
and, afterwards, is loaded once again using load_adapter()
somwhere. However, as long as the last loaded checkpoint is the correct one, this warning will not result in any issues.
This issue has been automatically marked as stale because it has been without activity for 90 days. This issue will be closed in 14 days unless you comment or remove the stale label.
This issue was closed because it was stale for 14 days without any activity.