python-goose
python-goose copied to clipboard
Error in title extractor
I'm having this error with this and few other websites http://nguyenminhson.vxartgallery.com/ File "/home/nargiza/virtualenvs/myenv/local/lib/python2.7/site-packages/goose_extractor-1.0.25-py2.7.egg/goose/extractors/title.py", line 56, in clean_title if title_words[0] in TITLE_SPLITTERS: exceptions.IndexError: list index out of range
I'm also encountering this same error.
Ditto!
Getting the same error as well.
same here:
article = goose.extract(raw_html = data)
File "/usr/local/lib/python2.7/dist-packages/goose/__init__.py", line 56, in extract return self.crawl(cc)
File "/usr/local/lib/python2.7/dist-packages/goose/__init__.py", line 66, in crawl
article = crawler.crawl(crawl_candiate)
File "/usr/local/lib/python2.7/dist-packages/goose/crawler.py", line 154, in crawl
self.article.title = self.title_extractor.extract()
File "/usr/local/lib/python2.7/dist-packages/goose/extractors/title.py", line 99, in extract
return self.get_title()
File "/usr/local/lib/python2.7/dist-packages/goose/extractors/title.py", line 78, in get_title
return self.clean_title(title)
File "/usr/local/lib/python2.7/dist-packages/goose/extractors/title.py", line 56, in clean_title
if title_words[0] in TITLE_SPLITTERS:
IndexError: list index out of range
Same here
Bump. Seeing the same error.