PLE Reproducibility after updating run.sh

Reproducibility after updating run.sh

Open VPeterV opened this issue 6 years ago • 3 comments

Hi, thanks for sharing! I have seen the issue https://github.com/INK-USC/PLE/issues/3. And the run.sh that I am using now has been updated with correct parameters as your newest run.sh. But I still met the same problem as the issue mentioned.

I still cannot get the experimental results in Table 9 in the paper.

And here is what I got:

For BBN dataset:

Evaluate on test data...
Predicted labels (embedding):
prediction:13276, ground:13276
accuracy: 0.655302802049
macro_precision, macro_recall, macro_f1: 0.80969198831 0.738642569298 0.772537128728
micro_precision, micro_recall, micro_f1: 0.691044043291 0.730320393923 0.710139554854

For OntoNotes dataset:

Evaluate on test data... 
Predicted labels (embedding): 
prediction:9514, ground:9514 
accuracy: 0.507778011352
macro_precision, macro_recall, macro_f1: 0.747326308598 0.616188643052 0.675451310714 micro_precision, micro_recall, micro_f1: 0.719988123515 0.535425351516 0.614140119916

I have already ran the program several times. However, the program does not seem to get better results. Do you have any insight about what's going here? Look forward to your reply. Thanks!

Feb 28 '19 05:02 VPeterV

I also have this problem, and I see PLE+FIGER is the best method. Do we need to use some parameter to point FIGER? Thanks.

Jun 03 '19 14:06 Refrainlhy

@shanzhenren hope for your reply

Jun 03 '19 15:06 Refrainlhy

hi @Refrainlhy , I would suggest to tune the hyper-params of FIGER classifier in "Classifier.py" to see if you could get better results. The hyper-params we provided there may be out of date and may be get the reported numbers. In addition, the performance is also a bit sensitive to random seeds for embeddings' initialization.

Jun 04 '19 19:06 shanzhenren

PLE PLE copied to clipboard

Reproducibility after updating run.sh

PLE
PLE copied to clipboard