fastembed
fastembed copied to clipboard
[Feature]: Last token pooling for causal embedding models
What feature would you like to request?
The Qwen3 models will need something like this (this is taken from Qwen3 example):
def last_token_pool(last_hidden_states: Tensor,
attention_mask: Tensor) -> Tensor:
left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
if left_padding:
return last_hidden_states[:, -1]
else:
sequence_lengths = attention_mask.sum(dim=1) - 1
batch_size = last_hidden_states.shape[0]
return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]
Is there any additional information you would like to provide?
No response