python-goose icon indicating copy to clipboard operation
python-goose copied to clipboard

Html Content / Article Extractor, web scrapping lib in Python

Results 100 python-goose issues
Sort by recently updated
recently updated
newest added

Installation using pip failed. Traceback (most recent call last): File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/setuptools/sandbox.py", line 152, in save_modules yield saved File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/setuptools/sandbox.py", line 193, in setup_context yield File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/setuptools/sandbox.py", line 237, in run_setup...

Seeing extraction errors on certain websites that have titles. `File "/usr/local/lib/python2.7/site-packages/ContentAnalysis-0.1.1-py2.7.egg/ContentAnalysis/document.py", line 53, in parse ginfo = g.extract(url=self.link) File "/usr/local/lib/python2.7/site-packages/goose/__init__.py", line 56, in extract return self.crawl(cc) File "/usr/local/lib/python2.7/site-packages/goose/__init__.py", line 66,...

Option `--use-mirrors` have been deprecated in `pip` since version 7.0.0. This will cause the Travis check to be failed for any new pull request.

While extraction from Forbes.com not getting the needed data and getting unnecessary data in many cases . Here the code ``` >>>from goose import Goose >>> g=Goose() >>> art=g.extract(url='http://www.forbes.com/2009/03/18/federal-funds-commerce-ibm-markets-transcript-aig.html') >>>...

I'm having this error with this and few other websites http://nguyenminhson.vxartgallery.com/ File "/home/nargiza/virtualenvs/myenv/local/lib/python2.7/site-packages/goose_extractor-1.0.25-py2.7.egg/goose/extractors/title.py", line 56, in clean_title if title_words[0] in TITLE_SPLITTERS: exceptions.IndexError: list index out of range

For example the following markup; ``` This is a link and this is some text. ``` Will produce the following; ``` and this is some text. ``` It would be...

Mac OS X 10.10.5 David-Laxers-MacBook-Pro:python-goose davidlaxer$ python -V Python 2.7.10 :: Anaconda 2.3.0 (x86_64) David-Laxers-MacBook-Pro:python-goose davidlaxer$ conda -V conda 3.16.0 ``` David-Laxers-MacBook-Pro:~ davidlaxer$ git clone https://github.com/grangier/python-goose.git Cloning into 'python-goose'... remote:...

Hi all, Got into trouble today when I tried to extract the content from some huffington posts. Here are two URLs that don't receive a 'cleaned_text' after the extraction :...

Hello I'm trying to extract image from the content my user posts. (I know it sounds little bit odd) I'm using wyswyg editor, django-ckeditor and when user posts text with...