NER-BERT-pytorch icon indicating copy to clipboard operation
NER-BERT-pytorch copied to clipboard

Something wrong with the total number of entities being evaluated ?

Open WaNePr opened this issue 5 years ago • 2 comments

For msra dataset, I realised that the total number of entities being evaluated is not the same as it is. As you can see, For test data, the support (true entities) is: Screenshot 2019-10-10 at 18 02 40 But the true entities are (I also checked the dataset you created, which also match the counts below): WeChatWorkScreenshot_b622d7c5-a023-4de6-821e-af26c6e718df

Can you please take a look into this problem ?

WaNePr avatar Oct 10 '19 10:10 WaNePr

Thank you for your attention. The msra dataset in this repo, I obtained from the repo, can not guarantee the authority of the data. Test data and statistical results should be consistent, I will look at this issue later.

lemonhu avatar Oct 30 '19 13:10 lemonhu

The reason is found. Due to the limitation of the maximum length of the sequence, sequences longer than this will be truncated, and the entities in the original sentence will be partially lost. Therefore, the evaluation results (support) on the test dataset are slightly different from the original data statistics.

lemonhu avatar Nov 14 '19 02:11 lemonhu