CasEE
CasEE copied to clipboard
Some details
Hi, I read your code which is an excellent work. And here I listed some details and my questions about your work to avoid wrong understanding:
(1) The indicator function I(r,c) in your paper is to indicate whether the role r belongs to the type c . But in your code, you actually used the predefined event schema (i.e., ty_args_id which contains the infomation given by ty_args.json) . According to this, the indicator function not really decided whether the role r belongs to the type c, but just act as a computed weight coefficient to adjust the score computed by the sigmoid func.
(2) Except for the event schema, in prediction, you also use the other prior information, such as largest length of trigger/argument. Did all the three prior information are only calculated from training data?
(3)In the evaluation metric, I find that the evaluation metric of argument identification and argument classification missing trigger information, so it's not a very strict metric. And if adding these(I mean an argument is correctly identified if ifs offset, related trigger type and trigger's offsets exactly match a reference argument), the performance will decrease.
Hi, thanks for your comments.
The simple indicator function I(r,c) is usually learned well in our experiments, reflecting the correspondence between the event types and argument roles. It helps the model to avoid learning redundant roles that the type doesn't have, and thus improves the model performance in our experiments.
Besides, the pre-defined event schema and the largest length actually works as additional post-processing, which slightly improves the performance. Note that they are only obtained from training data, and are very easy to obtain by data statistics.
As for the evaluation metric, we referred to the metric code from previous researches reported in the paper. You could also try other metric code for evaluation.
Hi, thanks for your comments.
The simple indicator function I(r,c) is usually learned well in our experiments, reflecting the correspondence between the event types and argument roles. It helps the model to avoid learning redundant roles that the type doesn't have, and thus improves the model performance in our experiments.
Besides, the pre-defined event schema and the largest length actually works as additional post-processing, which slightly improves the performance. Note that they are only obtained from training data, and are very easy to obtain by data statistics.
As for the evaluation metric, we referred to the metric code from previous researches reported in the paper. You could also try other metric code for evaluation.
Thank u so much. And, the pre-defined event schema and the largest length are only obtained from training data, not including evaluation data, right?
Yes, correct.
Hi, thanks for your comments. The simple indicator function I(r,c) is usually learned well in our experiments, reflecting the correspondence between the event types and argument roles. It helps the model to avoid learning redundant roles that the type doesn't have, and thus improves the model performance in our experiments. Besides, the pre-defined event schema and the largest length actually works as additional post-processing, which slightly improves the performance. Note that they are only obtained from training data, and are very easy to obtain by data statistics. As for the evaluation metric, we referred to the metric code from previous researches reported in the paper. You could also try other metric code for evaluation.
Thank u so much. And, the pre-defined event schema and the largest length are only obtained from training data, not including evaluation data, right?
Thank you for your replay. I did the statistic, and find that the statistic results on three ways(only on train, or on train/dev , or on train/dev/test) are only different in the length of the role "way", all other info in the the pre-defined event schema and the largest length are the same.
Yes, correct.
I forgot one important issue which is your work/code now is just suit for "no negative samples" (all data in your case have event), If I want to apply your work in situation where negative samples exist in train/dev/test data(such as ACE event data), I need to adjust not only the train process but also the model itself, is that right?
Yes, correct.
I forgot one important issue which is your work/code now is just suit for "no negative samples" (all data in your case have event), If I want to apply your work in situation where negative samples exist in train/dev/test data(such as ACE event data), I need to adjust not only the train process but also the model itself, is that right?
I reconstructed ACE2005 and ran the experiment on casEE, why all my results are 0.000 0.001, is your ACE experiment running properly?
yes,I run it properly.
是的,我运行正常。
请问你在ace2005上的性能是多少?我跑出来之后type的f值较低,这是正常的吗
不具备参考性,每个人处理ACE原始数据都不大一样,而且时间有点久了,等我过段时间有时间再复现看看,再告诉你。
发自我的iPhone
------------------ Original ------------------ From: Zhuoran Jin @.> Date: 周六,4月 16,2022 6:16 下午 To: JiaweiSheng/CasEE @.> Cc: ItGirls @.>, Author @.> Subject: Re: [JiaweiSheng/CasEE] Some details (#6)
是的,我运行正常。
请问你在ace2005上的性能是多少? 我跑出来之后type的f值较低,这是正常的吗
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>