keras-text-summarization icon indicating copy to clipboard operation
keras-text-summarization copied to clipboard

Why does the generated sentence have little to do with the original?

Open uhauha2929 opened this issue 7 years ago • 1 comments

我用中文语料做的测试,词语和字符我都试过了,生成的句子貌似也有可读性,但是和原文出入较大,不知道是什么原因,是模型太简单了?

uhauha2929 avatar Apr 27 '18 11:04 uhauha2929

@uhauha2929 the text body and summarized text use different vocabulary, which might explain what you observed. Also the max_sequence_length is set on the text body, meaning it does not read any text from the text body after the max_sequence_length of words are read from the text body. One way to address the issue u mentioned is to use a single vocabulary for both text body and summarized text or read more texts from the text body. Also depending on the language of the text body (for example, the chinese requires chinese tokenizer may give a better result i think)

chen0040 avatar May 03 '18 04:05 chen0040