crnn-relation-classification
crnn-relation-classification copied to clipboard
How's your model's performance on SemEval 2010 Task 8?
Hi, have you tried your model on SemEval 2010 Task 8? I ran it, but the test accuracy is only 16.79%.
I haven't tried it on the Task 8 data, since I was only focusing on biomedical domain at the time. However, I'm pretty sure it wouldn't be as low as you say. In fact, the Task 8 data is arguably easier to classify than the i2b2, since the sentences are much more "structured" and the class distribution is more uniform. I will try to run it on this dataset when I get time. I'll keep this as an open issue till then.
I forked your code and add support for Task 8 data.
The test accuracy now is 49.50%, while the accuracy of CNN model without position embedding is 59.15%. Can you improve it?
git clone https://github.com/FrankWork/crnn-relation-classification.git
python train_crnn.py
python train_cnn.py
One thing I saw in your code is that you are using randomly initialized word embeddings. The number of parameters in CRNN is already somewhat larger than in CNN, and add to that the word embedding parameters. This is probably why CRNN is overfitting on the training data so much. Could you try having pretrained embeddings (Word2Vec or GloVe), and setting trainable=False in the model?
I will try.