python-goose
python-goose copied to clipboard
Goose is not working on extracting data from Kissmetrics blog which have some meta tags present.
I am trying to extract content from http://feedproxy.google.com/~r/KISSmetrics/~3/cmb43Q4Mzak/ which gets redirected to this https://blog.kissmetrics.com/optimize-your-social-media-ad-spend-with-advanced-targeting-options/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+KISSmetrics+%28KISSmetrics+Marketing+Blog%29
I am getting below error.
File "D:\env\lib\site-packages\goose__init__.py", line 56, in extract return self.crawl(cc) File "D:\env\lib\site-packages\goose__init__.py", line 66, in crawl article = crawler.crawl(crawl_candiate) File "D:\env\lib\site-packages\goose\crawler.py", line 154, in crawl self.article.title = self.title_extractor.extract() File "D:\env\lib\site-packages\goose\extractors\title.py", line 99, in extract return self.get_title() File "D:\env\lib\site-packages\goose\extractors\title.py", line 78, in get_title return self.clean_title(title) File "D:\env\lib\site-packages\goose\extractors\title.py", line 42, in clean_title title = title.replace(site_name, '').strip() TypeError: expected a character buffer object
I think it's because of site_map OpenGraph tag in the website.
Please help me out.