MMN
MMN copied to clipboard
exact training, validation and test sets in your experiments
Can you share the exact processed training, validation and test sets you used for your experiments? That way we can make an apple to apple comparison of results.
Can you also share the Rouge scoring script that you used?
Hi, once you one the provided JSON files, you can find keys with '_tokenized' postfix and those are our preprocessed version. You can make exact same vocabulary with those tokens. (And we are currently working on releasing our code. Sorry for the late upload.)
And we use pltrdy's implementation of ROUGE score (https://github.com/pltrdy/rouge) for TIFU-short and TIFU-long dataset, and original Rouge155 for Newsroom-abs and XSum dataset.
Thanks
Thanks, the problem still remains -- how do I split into the exact train / validation / test sets used in the paper? The paper states "We randomly split the dataset into 95% for training, 5% for test". Can you share which datapoints ended up in the 5% test, which were in the 95% training, and which of the training ones ended up in your validation set? For example, the XSum dataset provides the exact train / valid / test splits used in their experiments. Can you do the same for yours? That would make results obtained by others to be comparable to yours.
Also, after taking the most frequent 15k words from the "_tokenized" keys, I am left with some datapoints for which there is only a 1 word summary and which becomes an empty string after OOV removal. How did you handle these cases? The paper says "We do not take OOV words into consideration", does this mean you simply remove OOV words, or use the OOV token to replace them? Do these words affect Rouge scores?
@n-hossain Hi, did you find the answer to this? I'm looking for the same exact splits of data...
Hi @sajastu , me the same... How did you deal with the splits?
Hi @cylnlp
I finally decided to go with the Pegasus split of data; got the same response from the authors.
Hi @sajastu Thanks so much!