PreSumm
PreSumm copied to clipboard
Why is `summary_size=3` inside `greedy_selection` when creating BERT data?
The title is basically the question, but to elaborate I'm going through the code step-by-step so that I can create the BERT-style data used in this model to use with other summarization datasets as well.
I noticed inside data_builder._format_to_bert
the value passed to the argument summary_size
for the function greedy_selection
is 3
.
Why is this hard-coded like this? If my understanding is correct, summary_size
basically refers to how many reference sentences there are for each src/tgt
pair. There are many samples where summary_size != 3
.
Even if I give it summary_size other than 3, it produces candidate summary with only 3 sentences.
and this could also be the reason why the rouge scores are low for my other dataset.