MPQG icon indicating copy to clipboard operation
MPQG copied to clipboard

Statistics of released dataset splits

Open deepaknlp opened this issue 6 years ago • 5 comments

Hey @freesunshine0316,

Would you please confirm the stats of train/dev/test in your experiment? With the released data, I am getting: ~ 75500/17934/11805 Is this correct?

Thanks

deepaknlp avatar Feb 15 '19 03:02 deepaknlp

I release the data I used, sorry I'm busy on something else. I assume yours are correct

freesunshine0316 avatar Feb 24 '19 02:02 freesunshine0316

I have the same question, the train/dev/test split in your github is 71500/16758/11805, which is different in your paper(split1: 70,484/10,570/11,877, split2: 86,635/8,965/8,964). if convenient, could you please share the train/dev/test split code or data? @deepaknlp @freesunshine0316

Chevalier1024 avatar Sep 16 '19 14:09 Chevalier1024

@freesunshine0316 Split-2 link is here: https://res.qyzhou.me/redistribute.zip

deepaknlp avatar Sep 18 '19 10:09 deepaknlp

Hi @Fengfeng1024

"Split-2" (released by Zhou et al.) exactly match the statistics and @deepaknlp just shared the link. "Split-1" was originally released by Du et al., which we can't directly use as there is no information on the answer positions. As a result, we use their provided doclist-xxx.txt to generate our own data (provided along this repository). But we mistakenly report their train/dev/test split in our paper.

freesunshine0316 avatar Sep 18 '19 17:09 freesunshine0316

thank you so much for your help.

Chevalier1024 avatar Sep 19 '19 07:09 Chevalier1024