flair
flair copied to clipboard
Fine-tuning transformer-based models and IOB2 format
Hi, I had a question about the NER task and the transformer-based models. We know that we perform a token-level classification instead of a span-level when performing fine-tuning and adding a linear layer for classification. This means that models will not necessarily follow the IOB2 format (I: Inside, O: outside, B: Beggiging). I would like to know what happens, for example, when a token is split into its subtokens. Is the label of the original token preserved in the subtokens? There will clearly be more entities than the original ones if this is the case.
What happens when the final metrics are calculated? Is the label of the first subtoken taken, and will that be considered as the label of the original word?
I hope the doubt is understood :)
Example:
Original labels (1 entity):
Colon B-Disease Cancer I-Disease
Word Piece labels (2 entities):
Co B-Disease lon B-Disease Cancer I-Disease
If the prediction is the following:
Co B-Disease lon I-Disease Cancer I-Disease
Is that considered a true positive or false negative?
The predictions are on token level. You can choose how the embedding for a token is aggregated if it consists of several subtokens. This is done by setting the subtoken_pooling
parameter to either first
last
first_last
or mean
. The default is first
.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.