bert-extractive-summarizer icon indicating copy to clipboard operation
bert-extractive-summarizer copied to clipboard

Text length exceeds maximum of 1000000

Open rxlian opened this issue 4 years ago • 9 comments

Hi, I got an error while feeding the text into the summarizer as follows.

ValueError: [E088] Text of length 1519175 exceeds maximum of 1000000. The v2.x parser and NER models require roughly 1GB of temporary memory per 100,000 characters in the input. This means long texts may cause memory allocation errors. If you're not using the parser or NER, it's probably safe to increase the nlp.max_length limit. The limit is in number of characters, so you can check whether your inputs are too long by checking len(text).

I tried to add: nlp = spacy.load("en_core_web_sm") nlp.max_length = 1519175 but it doesn't work.

So I was wondering is there any ways to address this issue? Thanks.

rxlian avatar Jun 11 '20 23:06 rxlian

By the way, my transformer version is 2.3.0 instead of 2.2.2. And other things are the same.

rxlian avatar Jun 11 '20 23:06 rxlian

Were you able to figure out a workaround for this? Thanks

paulowoicho avatar Jul 17 '20 13:07 paulowoicho

Looks to be a current common issue with Spacy, especially with max length not working. I may look at adding docs for multi-lines with spacy that might resolve this issue.

dmmiller612 avatar Jul 19 '20 23:07 dmmiller612

+1

caramdache avatar Jul 27 '20 20:07 caramdache

Has this been solved?

rxlian avatar Aug 27 '20 00:08 rxlian

Has this been solved ?

lucasgsfelix avatar Mar 23 '21 17:03 lucasgsfelix

+1

RobinVds avatar Nov 15 '21 04:11 RobinVds

+1

srknowdis avatar Nov 23 '21 07:11 srknowdis

Did anyone solve this problem?

KTG1 avatar Mar 13 '22 11:03 KTG1