HeterSumGraph
HeterSumGraph copied to clipboard
Question about R1, R2, RL score
@dqwang122 thank for greate repo! I test with multi-news datasets, i get score from evaluate.py, but when i run code, the score very difference with your paper score public.
R1 | R2 | RL | |
---|---|---|---|
my test | 35.6630 | 12.2370 | 31.3000 |
paper | 46.05 | 16.35 | 42.08 |
my script is:
python evaluation.py --cuda --gpu 0 --model HDSG --save_root ./checkpoints --log_root ./log --use_pyrouge --test_model evalmultinews.ckpt -m 3
Maybe I wrong in some step!! Many thanks for your response.
如果是multi-news的话,要把-m设成9,但是我用3080跑了一下,也只有R1:40.4。你如果跑完了可以说一说你的结果吗?
如果是multi-news的话,要把-m设成9,但是我用3080跑了一下,也只有R1:40.4。你如果跑完了可以说一说你的结果吗?
@suwu-suwu I run the cmd in above, the difference between I and you is: m =3, my R1: 35.6630
@phamkhactu 对的,你只要把m设为9就好了,那么你设为9之后的结果是多少呢?
可以给个邮箱相互讨论这个代码吗? @phamkhactu
可以给个邮箱相互讨论这个代码吗? @phamkhactu
@suwu-suwu yes, u can connect to me by [email protected], but you should write by English lang for i understand.
Did you use the released checkpoint and set -m to 9 for multi-news datasets? Or could you test your ROUGE installation by using the released multi-news outputs to calculate the ROUGE score?
Yes, I get a ROUGE score on the published output and a 6% difference on the multipurpose news dataset from the data listed by the author
------------------ 原始邮件 ------------------ 发件人: "Danqing @.>; 发送时间: 2022年8月17日(星期三) 下午3:18 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [dqwang122/HeterSumGraph] Question about R1, R2, RL score (Issue #32)
Did you use the released checkpoint and set -m to 9 for multi-news datasets? Or could you test your ROUGE installation by using the released multi-news outputs to calculate the ROUGE score?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
Yes, I get a ROUGE score on the published output and a 6% difference on the multipurpose news dataset from the data listed by the author
What does "multipurpose news dataset" refer to? Is it the multi-news? What is the exact "a ROUGE score"? Is it R1 40.4? If you cannot get the reported scores (R1 46.05) from the released outputs, you had better check the installation of ROUGE. You can follow the instruction here(https://github.com/dqwang122/HeterSumGraph#rouge-installation). Besides, you should also recheck the data format and preprocessing.
My ROUGE installation should be fine as I have no problem with the CNN/DailyMail dataset at all, but the ROUGE score on the Multi-News dataset is: Rouge1 =40.4, RougE2 =15.7, Rougel =35.5
------------------ 原始邮件 ------------------ 发件人: "Danqing @.>; 发送时间: 2022年8月17日(星期三) 下午4:10 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [dqwang122/HeterSumGraph] Question about R1, R2, RL score (Issue #32)
Yes, I get a ROUGE score on the published output and a 6% difference on the multipurpose news dataset from the data listed by the author
What does "multipurpose news dataset" refer to? Is it the multi-news? What is the exact "a ROUGE score"? Is it R1 40.4? If you cannot get the reported scores (R1 46.05) from the released outputs, you had better check the installation of ROUGE. You can follow the instruction here(https://github.com/dqwang122/HeterSumGraph#rouge-installation). Besides, you should also recheck the data format and preprocessing.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
What is the Rouge score on the Multi-News dataset you got?
------------------ 原始邮件 ------------------ 发件人: "Danqing @.>; 发送时间: 2022年8月17日(星期三) 下午4:10 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [dqwang122/HeterSumGraph] Question about R1, R2, RL score (Issue #32)
Yes, I get a ROUGE score on the published output and a 6% difference on the multipurpose news dataset from the data listed by the author
What does "multipurpose news dataset" refer to? Is it the multi-news? What is the exact "a ROUGE score"? Is it R1 40.4? If you cannot get the reported scores (R1 46.05) from the released outputs, you had better check the installation of ROUGE. You can follow the instruction here(https://github.com/dqwang122/HeterSumGraph#rouge-installation). Besides, you should also recheck the data format and preprocessing.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
Could you recheck your data preprocessing? Since you have no problem with CNN/DM, there may be something wrong with your multi-news data.
Before we released the checkpoints and outputs, we reproduced all the results and could get similar results as reported in our paper.
Could you recheck your data preprocessing? Since you have no problem with CNN/DM, there may be something wrong with your multi-news data.
Before we released the checkpoints and outputs, we reproduced all the results and could get similar results as reported in our paper.
作者大大,我想问问 就是我把你给出的输出在evaluation.py上运行,怎么也是40.6,这是不是说明我数据预处理错了?
Could you recheck your data preprocessing? Since you have no problem with CNN/DM, there may be something wrong with your multi-news data.
Before we released the checkpoints and outputs, we reproduced all the results and could get similar results as reported in our paper.
作者大大,可以发一份预处理好了的Multi-News数据给我吗,[email protected],不方便的话没关系,能得到您的回复,已经很开心了
Could you recheck your data preprocessing? Since you have no problem with CNN/DM, there may be something wrong with your multi-news data. Before we released the checkpoints and outputs, we reproduced all the results and could get similar results as reported in our paper.
作者大大,我想问问 就是我把你给出的输出在evaluation.py上运行,怎么也是40.6,这是不是说明我数据预处理错了?
如果你使用released outputs,那么不需要调用evalution.py,直接使用utils.pyrouge_score_all()来对比我们的outputs和ground truth,判断是否是数据问题。 通过这个直接测试的结果应该和paper一致。如果差别仍然很大,那么就是数据问题或者ROUGE问题,请注意输入数据的格式。
Could you recheck your data preprocessing? Since you have no problem with CNN/DM, there may be something wrong with your multi-news data. Before we released the checkpoints and outputs, we reproduced all the results and could get similar results as reported in our paper.
作者大大,我想问问 就是我把你给出的输出在evaluation.py上运行,怎么也是40.6,这是不是说明我数据预处理错了?
如果你使用released outputs,那么不需要调用evalution.py,直接使用utils.pyrouge_score_all()来对比我们的outputs和ground truth,判断是否是数据问题。 通过这个直接测试的结果应该和paper一致。如果差别仍然很大,那么就是数据问题或者ROUGE问题,请注意输入数据的格式。
好的,真的十分感谢作者在百忙之中解答。
作者大大,我想问问 就是我把你给出的输出在evaluation.py上运行,怎么也是40.6,这是不是说明我数据预处理错了?
------------------ 原始邮件 ------------------ 发件人: "Danqing @.>; 发送时间: 2022年8月17日(星期三) 下午4:42 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [dqwang122/HeterSumGraph] Question about R1, R2, RL score (Issue #32)
Could you recheck your data preprocessing? Since you have no problem with CNN/DM, there may be something wrong with your multi-news data.
Before we released the checkpoints and outputs, we reproduced all the results and could get similar results as reported in our paper.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
为什么我的ROUGE在CNN/DailyMail dataset上的效果没有论文上好呢,求解答!