crnn-relation-classification icon indicating copy to clipboard operation
crnn-relation-classification copied to clipboard

How's your model's performance on SemEval 2010 Task 8?

Open FrankWork opened this issue 6 years ago • 4 comments

Hi, have you tried your model on SemEval 2010 Task 8? I ran it, but the test accuracy is only 16.79%.

FrankWork avatar Jan 08 '18 05:01 FrankWork

I haven't tried it on the Task 8 data, since I was only focusing on biomedical domain at the time. However, I'm pretty sure it wouldn't be as low as you say. In fact, the Task 8 data is arguably easier to classify than the i2b2, since the sentences are much more "structured" and the class distribution is more uniform. I will try to run it on this dataset when I get time. I'll keep this as an open issue till then.

desh2608 avatar Jan 08 '18 06:01 desh2608

I forked your code and add support for Task 8 data.

The test accuracy now is 49.50%, while the accuracy of CNN model without position embedding is 59.15%. Can you improve it?

git clone https://github.com/FrankWork/crnn-relation-classification.git
python train_crnn.py
python train_cnn.py

FrankWork avatar Jan 08 '18 10:01 FrankWork

One thing I saw in your code is that you are using randomly initialized word embeddings. The number of parameters in CRNN is already somewhat larger than in CNN, and add to that the word embedding parameters. This is probably why CRNN is overfitting on the training data so much. Could you try having pretrained embeddings (Word2Vec or GloVe), and setting trainable=False in the model?

desh2608 avatar Jan 08 '18 11:01 desh2608

I will try.

FrankWork avatar Jan 08 '18 14:01 FrankWork