ner_incomplete_annotation icon indicating copy to clipboard operation
ner_incomplete_annotation copied to clipboard

Using the same dataset to train and evaluate the model can't reach to 100% F1 score

Open possible1402 opened this issue 1 year ago • 2 comments

Hi, I want to make sure that the model architecture works well, so I use the same dataset(golden true dataset without removing entities) to train and evaluate the model. The ideal result is that the model should be overfitted and the F1 score should be 100%. When I use conll dataset, it work well. When I change to my own dataset, the F1 score can only get around 80% not 100%. And I'm kind of confused about the result. The experiment log is in this link: wandb link And I also use train set(18000 samples) to train the model and evaluate on dev set(3000 samples) and the F1 score can reach to 60%. I really hope you can help me out. Thanks!

possible1402 avatar Sep 07 '22 12:09 possible1402

Thanks. Can you let me know which version of this repo you are using? (PyTorch or DyNet)?

allanj avatar Sep 08 '22 05:09 allanj

Are you able to overfit your dataset with a normal lstm crf model?

allanj avatar Sep 08 '22 05:09 allanj

Sorry for the late response. I'm using Pytorch version. And actually I figured this out just by running more epochs. And it can reach to roughly 100%. I think this model just need more time to generalize compared to the normal lstm crf model. Thanks for reaching out by the way.

possible1402 avatar Nov 02 '22 09:11 possible1402