HeterSumGraph icon indicating copy to clipboard operation
HeterSumGraph copied to clipboard

Question about R1, R2, RL score

Open phamkhactu opened this issue 2 years ago • 17 comments

@dqwang122 thank for greate repo! I test with multi-news datasets, i get score from evaluate.py, but when i run code, the score very difference with your paper score public.

R1 R2 RL
my test 35.6630 12.2370 31.3000
paper 46.05 16.35 42.08

my script is:

python evaluation.py --cuda --gpu 0  --model HDSG --save_root ./checkpoints --log_root ./log --use_pyrouge --test_model evalmultinews.ckpt -m 3

Maybe I wrong in some step!! Many thanks for your response.

phamkhactu avatar Jul 09 '22 15:07 phamkhactu

如果是multi-news的话,要把-m设成9,但是我用3080跑了一下,也只有R1:40.4。你如果跑完了可以说一说你的结果吗?

suwu-suwu avatar Jul 21 '22 13:07 suwu-suwu

如果是multi-news的话,要把-m设成9,但是我用3080跑了一下,也只有R1:40.4。你如果跑完了可以说一说你的结果吗?

@suwu-suwu I run the cmd in above, the difference between I and you is: m =3, my R1: 35.6630

phamkhactu avatar Jul 21 '22 15:07 phamkhactu

@phamkhactu 对的,你只要把m设为9就好了,那么你设为9之后的结果是多少呢?

suwu-suwu avatar Jul 21 '22 15:07 suwu-suwu

可以给个邮箱相互讨论这个代码吗? @phamkhactu

suwu-suwu avatar Jul 21 '22 15:07 suwu-suwu

可以给个邮箱相互讨论这个代码吗? @phamkhactu

@suwu-suwu yes, u can connect to me by [email protected], but you should write by English lang for i understand.

phamkhactu avatar Jul 22 '22 01:07 phamkhactu

Did you use the released checkpoint and set -m to 9 for multi-news datasets? Or could you test your ROUGE installation by using the released multi-news outputs to calculate the ROUGE score?

dqwang122 avatar Aug 17 '22 07:08 dqwang122

Yes, I get a ROUGE score on the published output and a 6% difference on the multipurpose news dataset from the data listed by the author

------------------ 原始邮件 ------------------ 发件人: "Danqing @.>; 发送时间: 2022年8月17日(星期三) 下午3:18 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [dqwang122/HeterSumGraph] Question about R1, R2, RL score (Issue #32)

Did you use the released checkpoint and set -m to 9 for multi-news datasets? Or could you test your ROUGE installation by using the released multi-news outputs to calculate the ROUGE score?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

suwu-suwu avatar Aug 17 '22 07:08 suwu-suwu

Yes, I get a ROUGE score on the published output and a 6% difference on the multipurpose news dataset from the data listed by the author

What does "multipurpose news dataset" refer to? Is it the multi-news? What is the exact "a ROUGE score"? Is it R1 40.4? If you cannot get the reported scores (R1 46.05) from the released outputs, you had better check the installation of ROUGE. You can follow the instruction here(https://github.com/dqwang122/HeterSumGraph#rouge-installation). Besides, you should also recheck the data format and preprocessing.

dqwang122 avatar Aug 17 '22 08:08 dqwang122

My ROUGE installation should be fine as I have no problem with the CNN/DailyMail dataset at all, but the ROUGE score on the Multi-News dataset is: Rouge1 =40.4, RougE2 =15.7, Rougel =35.5

------------------ 原始邮件 ------------------ 发件人: "Danqing @.>; 发送时间: 2022年8月17日(星期三) 下午4:10 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [dqwang122/HeterSumGraph] Question about R1, R2, RL score (Issue #32)

Yes, I get a ROUGE score on the published output and a 6% difference on the multipurpose news dataset from the data listed by the author

What does "multipurpose news dataset" refer to? Is it the multi-news? What is the exact "a ROUGE score"? Is it R1 40.4? If you cannot get the reported scores (R1 46.05) from the released outputs, you had better check the installation of ROUGE. You can follow the instruction here(https://github.com/dqwang122/HeterSumGraph#rouge-installation). Besides, you should also recheck the data format and preprocessing.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

suwu-suwu avatar Aug 17 '22 08:08 suwu-suwu

What is the Rouge score on the Multi-News dataset you got?

------------------ 原始邮件 ------------------ 发件人: "Danqing @.>; 发送时间: 2022年8月17日(星期三) 下午4:10 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [dqwang122/HeterSumGraph] Question about R1, R2, RL score (Issue #32)

Yes, I get a ROUGE score on the published output and a 6% difference on the multipurpose news dataset from the data listed by the author

What does "multipurpose news dataset" refer to? Is it the multi-news? What is the exact "a ROUGE score"? Is it R1 40.4? If you cannot get the reported scores (R1 46.05) from the released outputs, you had better check the installation of ROUGE. You can follow the instruction here(https://github.com/dqwang122/HeterSumGraph#rouge-installation). Besides, you should also recheck the data format and preprocessing.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

suwu-suwu avatar Aug 17 '22 08:08 suwu-suwu

Could you recheck your data preprocessing? Since you have no problem with CNN/DM, there may be something wrong with your multi-news data.

Before we released the checkpoints and outputs, we reproduced all the results and could get similar results as reported in our paper.

dqwang122 avatar Aug 17 '22 08:08 dqwang122

Could you recheck your data preprocessing? Since you have no problem with CNN/DM, there may be something wrong with your multi-news data.

Before we released the checkpoints and outputs, we reproduced all the results and could get similar results as reported in our paper.

作者大大,我想问问 就是我把你给出的输出在evaluation.py上运行,怎么也是40.6,这是不是说明我数据预处理错了?

suwu-suwu avatar Aug 17 '22 09:08 suwu-suwu

Could you recheck your data preprocessing? Since you have no problem with CNN/DM, there may be something wrong with your multi-news data.

Before we released the checkpoints and outputs, we reproduced all the results and could get similar results as reported in our paper.

作者大大,可以发一份预处理好了的Multi-News数据给我吗,[email protected],不方便的话没关系,能得到您的回复,已经很开心了

suwu-suwu avatar Aug 17 '22 09:08 suwu-suwu

Could you recheck your data preprocessing? Since you have no problem with CNN/DM, there may be something wrong with your multi-news data. Before we released the checkpoints and outputs, we reproduced all the results and could get similar results as reported in our paper.

作者大大,我想问问 就是我把你给出的输出在evaluation.py上运行,怎么也是40.6,这是不是说明我数据预处理错了?

如果你使用released outputs,那么不需要调用evalution.py,直接使用utils.pyrouge_score_all()来对比我们的outputs和ground truth,判断是否是数据问题。 通过这个直接测试的结果应该和paper一致。如果差别仍然很大,那么就是数据问题或者ROUGE问题,请注意输入数据的格式。

dqwang122 avatar Aug 17 '22 09:08 dqwang122

Could you recheck your data preprocessing? Since you have no problem with CNN/DM, there may be something wrong with your multi-news data. Before we released the checkpoints and outputs, we reproduced all the results and could get similar results as reported in our paper.

作者大大,我想问问 就是我把你给出的输出在evaluation.py上运行,怎么也是40.6,这是不是说明我数据预处理错了?

如果你使用released outputs,那么不需要调用evalution.py,直接使用utils.pyrouge_score_all()来对比我们的outputs和ground truth,判断是否是数据问题。 通过这个直接测试的结果应该和paper一致。如果差别仍然很大,那么就是数据问题或者ROUGE问题,请注意输入数据的格式。

好的,真的十分感谢作者在百忙之中解答。

suwu-suwu avatar Aug 17 '22 09:08 suwu-suwu

作者大大,我想问问 就是我把你给出的输出在evaluation.py上运行,怎么也是40.6,这是不是说明我数据预处理错了?

------------------ 原始邮件 ------------------ 发件人: "Danqing @.>; 发送时间: 2022年8月17日(星期三) 下午4:42 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [dqwang122/HeterSumGraph] Question about R1, R2, RL score (Issue #32)

Could you recheck your data preprocessing? Since you have no problem with CNN/DM, there may be something wrong with your multi-news data.

Before we released the checkpoints and outputs, we reproduced all the results and could get similar results as reported in our paper.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

suwu-suwu avatar Oct 11 '22 08:10 suwu-suwu

为什么我的ROUGE在CNN/DailyMail dataset上的效果没有论文上好呢,求解答!

yangmuli78 avatar Mar 12 '23 08:03 yangmuli78