NER-using-Deep-Learning icon indicating copy to clipboard operation
NER-using-Deep-Learning copied to clipboard

Fixing Imbalanced Data

Open ArmandGiraud opened this issue 7 years ago • 12 comments

The NER corpus include many more 'O' label than any entities. How can we fix this using keras? I tried sample_weight to ajust the loss function during training, but it does not appear to fix the problem fully. What would you suggest? Thx

ArmandGiraud avatar May 25 '17 13:05 ArmandGiraud

In case of Hindi data, there are surely many 'O' entries. Fixing this is not entirely possible, as we would have to go through entire dataset, or create a new one(that is extreme task). We can only make some assumptions, like using only those sentences which have some certain specific number of named entities, using sentences with max_len <= threshold, etc. I dont understand fixing this by keras. Can you explain more?

pandeydivesh15 avatar May 25 '17 13:05 pandeydivesh15

Actually that was unclear from me, when I try to train the model on the english conll dataset, The classifier only predicts '0' label, and this yields a high accuracy (around 97%). class imbalance evidence

Maybe I'm just doing something wrong, but i don't see what. I already encountered class imbalances in other ML cases. But I'm wondering if there is any preferred solution for the NER problem. There are many ways of addressing this problem, (such as oversampling, undersampling or SMOTE) or some solutions within the keras options such as setting class weights in the loss function.

ArmandGiraud avatar May 25 '17 14:05 ArmandGiraud

That image suggests that you are surely doing something wrong. How was the output when you were training the model using keras. Was valid. acc increasing steadily(at a optimum rate) and loss decreasing at a good rate? For handling class imbalances, you can do something like I told in previous comment.

pandeydivesh15 avatar May 25 '17 14:05 pandeydivesh15

I tried to run the script with default settings, as it can be found in english_NER.ipnb. The accuracy (and logloss) is stuck at 97.3 from the first epoch. I'm trying to figure out what is going wrong.

ArmandGiraud avatar May 25 '17 18:05 ArmandGiraud

Sorry for late replying. Were you also getting very low loss (in negative powers of 10) and NaN values during training?

pandeydivesh15 avatar May 29 '17 07:05 pandeydivesh15

Hello Divesh, I have a very low loss from the first epoch, I joined a capture of training logs: ner_with_deep_learning

The only thing I changed was adding a few parenthis to print functions since I'm running your scripts with python 3, maybe I'm also using a different version of Keras tensorflow, I have keras 2.0.0 and tensorflow 1.0.1 installed on windows 64, which versions did you use initially? Thanks for helping

ArmandGiraud avatar May 29 '17 20:05 ArmandGiraud

The problem is version numbers. I should have made requirements.txt. I used Keras==1.2.1 and tensorflow-gpu==0.12.1. Though I had tensorflow with GPU support, you can avoid that by installing just tensorflow==0.12.1. Try this in a new env and let me know. About using python 3, some problems can occur while handling unicodes, but in our case, chances are less.

pandeydivesh15 avatar May 30 '17 03:05 pandeydivesh15

I got a similar issue that all are predicted to be 'O' for the English dataset, but my issue is even worse as the losses are all nan since the beginning. I will try to match the versions of Keras and tensorflow. Do you have any other advice on this issue? Thanks.

jenniferzhu avatar Aug 21 '17 01:08 jenniferzhu

A follow-up with that, it does not seem that the versions of tesorflow and Keras can solve my loss: nan issues. I am wondering if this is due to the gpu vs cpu?

jenniferzhu avatar Aug 21 '17 01:08 jenniferzhu

It turns out that I get the same answer with @ArmandGiraud right now. @pandeydivesh15 what was the accuracy?

jenniferzhu avatar Aug 27 '17 18:08 jenniferzhu

I trained one model just now. Output in my case: 22

pandeydivesh15 avatar Aug 27 '17 19:08 pandeydivesh15

Still having this

sayantanbbb avatar Jan 07 '22 11:01 sayantanbbb