bert-extractive-summarizer Text length exceeds maximum of 1000000

Text length exceeds maximum of 1000000

Open rxlian opened this issue 4 years ago • 9 comments

Hi, I got an error while feeding the text into the summarizer as follows.

ValueError: [E088] Text of length 1519175 exceeds maximum of 1000000. The v2.x parser and NER models require roughly 1GB of temporary memory per 100,000 characters in the input. This means long texts may cause memory allocation errors. If you're not using the parser or NER, it's probably safe to increase the nlp.max_length limit. The limit is in number of characters, so you can check whether your inputs are too long by checking len(text).

I tried to add: nlp = spacy.load("en_core_web_sm") nlp.max_length = 1519175 but it doesn't work.

So I was wondering is there any ways to address this issue? Thanks.

Jun 11 '20 23:06 rxlian

By the way, my transformer version is 2.3.0 instead of 2.2.2. And other things are the same.

Jun 11 '20 23:06 rxlian

Were you able to figure out a workaround for this? Thanks

Jul 17 '20 13:07 paulowoicho

Looks to be a current common issue with Spacy, especially with max length not working. I may look at adding docs for multi-lines with spacy that might resolve this issue.