NER-BERT-pytorch
NER-BERT-pytorch copied to clipboard
Something wrong with the total number of entities being evaluated ?
For msra dataset, I realised that the total number of entities being evaluated is not the same as it is.
As you can see, For test data, the support (true entities) is:
But the true entities are (I also checked the dataset you created, which also match the counts below):
Can you please take a look into this problem ?
Thank you for your attention. The msra dataset in this repo, I obtained from the repo, can not guarantee the authority of the data. Test data and statistical results should be consistent, I will look at this issue later.
The reason is found.
Due to the limitation of the maximum length of the sequence, sequences longer than this will be truncated, and the entities in the original sentence will be partially lost.
Therefore, the evaluation results (support
) on the test dataset are slightly different from the original data statistics.