transformers
transformers copied to clipboard
Return attention_mask in FeatureExtractionPipeline output
Feature request
Return attention_mask as one output of the FeatureExtractionPipeline so that padding token embeddings can be ignored.
Motivation
Who can help? @Narsil
When using the FeatureExtractionPipeline to generate sentence embeddings, the input to the pipeline processes a raw sentence with a tokenizer. The output of the pipeline is a tensor of shape [1, seq_len, hidden_dim]. If the input is padded, seq_len is equal to the max_length of the tokenizer or longest seq in the batch.
However, when performing mean pooling of individual word embeddings to obtain the sentence embedding, one may want to use attention_mask in order to ignore the padding token embeddings (see the mean pooling example below). But, FeatureExtractionPipeline does not return attention_mask as part of its output.
#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0] #First element of model_output contains all token embeddings
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
sum_embeddings = torch.sum(token_embeddings * input_mask_expanded, 1)
sum_mask = torch.clamp(input_mask_expanded.sum(1), min=1e-9)
return sum_embeddings / sum_mask
Your contribution
I can submit a pull request to the issue if it sounds good to you!
This doesn't seem like a use-case for the pipeline though. Since you want access to the process inputs, you should just used the tokenizer and the model directly.
Your comment makes sense. As my goal aligns with the pipeline's main functionality, I think I will subclass FeatureExtractionPipeline and make small modifications to achieve my goal. Feel free to close the issue. Thank you!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.