PreSumm icon indicating copy to clipboard operation
PreSumm copied to clipboard

Why is `summary_size=3` inside `greedy_selection` when creating BERT data?

Open seanswyi opened this issue 4 years ago • 2 comments

The title is basically the question, but to elaborate I'm going through the code step-by-step so that I can create the BERT-style data used in this model to use with other summarization datasets as well.

I noticed inside data_builder._format_to_bert the value passed to the argument summary_size for the function greedy_selection is 3.

Why is this hard-coded like this? If my understanding is correct, summary_size basically refers to how many reference sentences there are for each src/tgt pair. There are many samples where summary_size != 3.

seanswyi avatar May 05 '20 08:05 seanswyi

Even if I give it summary_size other than 3, it produces candidate summary with only 3 sentences.

AyeshaSarwar avatar May 19 '20 02:05 AyeshaSarwar

and this could also be the reason why the rouge scores are low for my other dataset.

AyeshaSarwar avatar May 19 '20 02:05 AyeshaSarwar