python-goose icon indicating copy to clipboard operation
python-goose copied to clipboard

Error in title extractor

Open nargiza-sarkulova opened this issue 9 years ago • 6 comments

I'm having this error with this and few other websites http://nguyenminhson.vxartgallery.com/ File "/home/nargiza/virtualenvs/myenv/local/lib/python2.7/site-packages/goose_extractor-1.0.25-py2.7.egg/goose/extractors/title.py", line 56, in clean_title if title_words[0] in TITLE_SPLITTERS: exceptions.IndexError: list index out of range

nargiza-sarkulova avatar Jan 08 '15 16:01 nargiza-sarkulova

I'm also encountering this same error.

lsemel avatar Mar 26 '15 22:03 lsemel

Ditto!

yang avatar Jun 09 '15 18:06 yang

Getting the same error as well.

blakeapm avatar Jul 16 '15 18:07 blakeapm

same here:

article = goose.extract(raw_html = data)
  File "/usr/local/lib/python2.7/dist-packages/goose/__init__.py", line 56, in extract return self.crawl(cc)
  File "/usr/local/lib/python2.7/dist-packages/goose/__init__.py", line 66, in crawl
    article = crawler.crawl(crawl_candiate)
  File "/usr/local/lib/python2.7/dist-packages/goose/crawler.py", line 154, in crawl
    self.article.title = self.title_extractor.extract()
  File "/usr/local/lib/python2.7/dist-packages/goose/extractors/title.py", line 99, in extract
    return self.get_title()
  File "/usr/local/lib/python2.7/dist-packages/goose/extractors/title.py", line 78, in get_title
    return self.clean_title(title)
  File "/usr/local/lib/python2.7/dist-packages/goose/extractors/title.py", line 56, in clean_title
    if title_words[0] in TITLE_SPLITTERS:
IndexError: list index out of range

daTokenizer avatar Sep 02 '15 10:09 daTokenizer

Same here

grigy avatar Jun 02 '16 05:06 grigy

Bump. Seeing the same error.

jlucier avatar Jun 27 '16 03:06 jlucier