python-goose icon indicating copy to clipboard operation
python-goose copied to clipboard

Goose is not working on extracting data from Kissmetrics blog which have some meta tags present.

Open jijoy opened this issue 9 years ago • 1 comments

I am trying to extract content from http://feedproxy.google.com/~r/KISSmetrics/~3/cmb43Q4Mzak/ which gets redirected to this https://blog.kissmetrics.com/optimize-your-social-media-ad-spend-with-advanced-targeting-options/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+KISSmetrics+%28KISSmetrics+Marketing+Blog%29

I am getting below error.

File "D:\env\lib\site-packages\goose__init__.py", line 56, in extract return self.crawl(cc) File "D:\env\lib\site-packages\goose__init__.py", line 66, in crawl article = crawler.crawl(crawl_candiate) File "D:\env\lib\site-packages\goose\crawler.py", line 154, in crawl self.article.title = self.title_extractor.extract() File "D:\env\lib\site-packages\goose\extractors\title.py", line 99, in extract return self.get_title() File "D:\env\lib\site-packages\goose\extractors\title.py", line 78, in get_title return self.clean_title(title) File "D:\env\lib\site-packages\goose\extractors\title.py", line 42, in clean_title title = title.replace(site_name, '').strip() TypeError: expected a character buffer object

I think it's because of site_map OpenGraph tag in the website.

jijoy avatar Oct 25 '15 14:10 jijoy

Please help me out.

jijoy avatar Oct 25 '15 14:10 jijoy