EMNLP2018-JMEE Evaluate function not right

trafficstars

https://github.com/lx865712528/JMEE/blob/494451d5852ba724d273ee6f97602c60a5517446/enet/testing.py#L72
In this line, if I add a line of code before
assert len(arugments) == len(argumenst_) There will be assert error.
I believe this is because in arugments there are golden arguments while only predict arugments in arguments_, which length will change dynamicly during traning.

Mar 01 '19 09:03 airkid

This computes the score wrong since if the model predict a wrong entity before all the good ones, the preds are not aligned and the score is 0, as shown in this example: gold roles are [(3,5,11),(7,9,9)] preds roles are [(0,2,2),(3,5,11),(7,9,9)] first iteration: compare (3,5,11) and (0,2,2) -> fail second iteration: compare (7,9,9) and (3,5,11) -> fail even though (3,5,11) was in the gold annotations. Here is a functionning version that also generate a per class report (it requires tabulate)

calculate_sets_1.txt

Mar 01 '19 16:03 DorianKodelja

Hi @airkid @DorianKodelja, I got with conclusion with you, according to DMCNN paper:

An argument is correctly classifiedd if its event subtype, offsets and argument role match those of any of the reference argument mentions

for item, item_ in zip(arguments, arguments_):

Above code in this repo does match the idea, so I replaced that line with:

ct += len(set(arguments) & set(arguments_))  # count any argument in golden
# for item, item_ in zip(arguments, arguments_):
#     if item[2] == item_[2]:
#         ct += 1

Mar 07 '19 10:03 mikelkl

Hi @mikelkl , I believe this is a kind of right implementation of calculating F1 score in this task.
Have you reproduce the experiment? I can only reach F1 score < 0.4 in the test data.

Mar 07 '19 10:03 airkid

Hi @airkid, I got slightly higher result, but it's on my own randomly splitting test set, hv no idea if it can efficively represent the paper result.

Mar 07 '19 11:03 mikelkl

Hi @mikelkl, can you try on the data split update by author?
My result is still far away from the paper.

Mar 07 '19 13:03 airkid

Hi @airkid, I'm afraid I cannot do that coz I hv no ACE2005 English data

Mar 11 '19 09:03 mikelkl

Hi @airkid Would you please tell me the result you got? I got only f1=0.64 in Trigger Classification.

Sep 05 '19 11:09 carrie0307

https://github.com/lx865712528/JMEE/blob/494451d5852ba724d273ee6f97602c60a5517446/enet/testing.py#L72 In this line, if I add a line of code before assert len(arugments) == len(argumenst_) There will be assert error. I believe this is because in arugments there are golden arguments while only predict arugments in arguments_, which length will change dynamicly during traning.

Hi,

If you've tried their code, would you tell me your reproduced results on trigger detection and argument detection?

Jul 15 '20 05:07 rhythmswing

EMNLP2018-JMEE EMNLP2018-JMEE copied to clipboard

Evaluate function not right

EMNLP2018-JMEE
EMNLP2018-JMEE copied to clipboard