flair
flair copied to clipboard
Better results on small amount of samples
Hello. Thanks for you effort working on flair framework, I achieved the best and fast results with it compare to others. It's amazing. I have a question which I can't understand yet. I did a search parameter run over my train/dev/test data. And found the best performed embedding stack and parameters for my task. StackedEmbeddings([WordEmbeddings('crawl'),CharacterEmbeddings(),FlairEmbeddings('news-forward'),FlairEmbeddings('news-backward')]) To be concrete I train model to recognise addresses in text.(NER) I'm trying different dropouts, and downsampling. And found the thing which I can't comprehend. All trains are done with 4 epochs.(Tried more but it starts overfitting i guess, will describe later as a question) I train NN on a big set with 2500 test/250 dev/250 test as a result I have good scores:
Results:
- F-score (micro) 0.9545
- F-score (macro) 0.9545
- Accuracy 0.913
By class:
precision recall f1-score support
ADDRESS 0.9545 0.9545 0.9545 286
micro avg 0.9545 0.9545 0.9545 286
macro avg 0.9545 0.9545 0.9545 286
weighted avg 0.9545 0.9545 0.9545 286
Result are looks good, however when I start run this model over my test document (unseen data) It can't find anything.
Then i downsample my data to only 10% such a 250/25/25 and gives me the score of:
Results:
- F-score (micro) 0.8148
- F-score (macro) 0.8148
- Accuracy 0.6875
By class:
precision recall f1-score support
ADDRESS 0.8800 0.7586 0.8148 29
micro avg 0.8800 0.7586 0.8148 29
macro avg 0.8800 0.7586 0.8148 29
weighted avg 0.8800 0.7586 0.8148 29
When I run this model over the same document it doing pretty well and find addresses, not 100% accuracy for sure, but very well. And there are several question which I can't answer yet.
- Why I have bad result with real data while train on a big dataset rather on a small.
- If I train model more than 4,5 epochs , it gives high score in evaluation but can't find any data on a real task.(even with downsampling). How could it be? Looking into loss/train plots I can't say for sure that it's overfitting. Please help to understand what is going on under the hood with these questions.
Hello @grinay it's a bit difficult to answer, but it sounds like overfitting. Perhaps the CharacterEmbeddings
learn very specific patterns that only exist in your training data. You could try:
- leaving the
CharacterEmbeddings
out to check if something similar happens. - reducing the RNN size to give the model less capacity
But this is only a guess: If you share a sample of the data, or the training logs, maybe we can tell you more.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.