flair Better results on small amount of samples

Better results on small amount of samples

Open grinay opened this issue 1 year ago • 1 comments

Hello. Thanks for you effort working on flair framework, I achieved the best and fast results with it compare to others. It's amazing. I have a question which I can't understand yet. I did a search parameter run over my train/dev/test data. And found the best performed embedding stack and parameters for my task. StackedEmbeddings([WordEmbeddings('crawl'),CharacterEmbeddings(),FlairEmbeddings('news-forward'),FlairEmbeddings('news-backward')]) To be concrete I train model to recognise addresses in text.(NER) I'm trying different dropouts, and downsampling. And found the thing which I can't comprehend. All trains are done with 4 epochs.(Tried more but it starts overfitting i guess, will describe later as a question) I train NN on a big set with 2500 test/250 dev/250 test as a result I have good scores:

Results:
- F-score (micro) 0.9545
- F-score (macro) 0.9545
- Accuracy 0.913

By class:
              precision    recall  f1-score   support

     ADDRESS     0.9545    0.9545    0.9545       286

   micro avg     0.9545    0.9545    0.9545       286
   macro avg     0.9545    0.9545    0.9545       286
weighted avg     0.9545    0.9545    0.9545       286

Result are looks good, however when I start run this model over my test document (unseen data) It can't find anything.

Then i downsample my data to only 10% such a 250/25/25 and gives me the score of:

Results:
- F-score (micro) 0.8148
- F-score (macro) 0.8148
- Accuracy 0.6875

By class:
              precision    recall  f1-score   support

     ADDRESS     0.8800    0.7586    0.8148        29

   micro avg     0.8800    0.7586    0.8148        29
   macro avg     0.8800    0.7586    0.8148        29
weighted avg     0.8800    0.7586    0.8148        29

When I run this model over the same document it doing pretty well and find addresses, not 100% accuracy for sure, but very well. And there are several question which I can't answer yet.

Why I have bad result with real data while train on a big dataset rather on a small.
If I train model more than 4,5 epochs , it gives high score in evaluation but can't find any data on a real task.(even with downsampling). How could it be? Looking into loss/train plots I can't say for sure that it's overfitting. Please help to understand what is going on under the hood with these questions.

Aug 25 '22 04:08 grinay

Hello @grinay it's a bit difficult to answer, but it sounds like overfitting. Perhaps the CharacterEmbeddings learn very specific patterns that only exist in your training data. You could try:

leaving the CharacterEmbeddings out to check if something similar happens.
reducing the RNN size to give the model less capacity

But this is only a guess: If you share a sample of the data, or the training logs, maybe we can tell you more.

Aug 30 '22 08:08 alanakbik

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Jan 07 '23 13:01 stale[bot]

flair flair copied to clipboard

Better results on small amount of samples

flair
flair copied to clipboard