TransformerSum icon indicating copy to clipboard operation
TransformerSum copied to clipboard

Possible to do sub-sentence level extractive summarization?

Open Hellisotherpeople opened this issue 3 years ago • 1 comments

After reading the documentation, it looks like the Extractive Summarization components only score sentences. While this is how the vast majority of extractive summarization papers work, some extractive summarization systems and datasets work at the word level of granularity (namely, my own work is exclusively word-level extractive summarization)

Is there some way to make TransformerSum work at the word level of granularity out of the box? When I trained extractive word-level models, I used a final token classification head for it. Maybe that can be implemented here alongside the current sentence scoring heads?

Hellisotherpeople avatar Dec 21 '20 02:12 Hellisotherpeople

@Hellisotherpeople Out of the box, TransformerSum only supports extractive summarization at the sentence level. It doesn't support word level granularity yet. This could be implemented into the library. However, there are no plans to integrate it yet since I'm not familiar with word-level extractive summarization. Possibly in the pooling module we could add another option that passes the token vectors through a classifier without condensing them into sentence vectors. We also may need to change the testing method to work at the word level. I will look into this sometime this week.

HHousen avatar Dec 22 '20 01:12 HHousen