crawley icon indicating copy to clipboard operation
crawley copied to clipboard

Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations.

Results 10 crawley issues
Sort by recently updated
recently updated
newest added

I can't find anything in the documentation about how to use mongodb to save the crawled data. Am I missing something ?

https://docs.python.org/2/library/urlparse.html#urlparse.urljoin provides a robust way to make a relative url into a absolute one. This fixes some issues like this one: When accessing this url: http://www1.abracom.org.br/cms/opencms/abracom/pt/associados/ We find relative links...

Any ideas why I'm getting... ``` ImportError: cannot import name ScopedSession ```

I'm using PyQuery, and I get wrong encode detection for this page: http://www1.abracom.org.br/cms/opencms/abracom/pt/associados/resultado_busca.html?nomeArq=0148.html The problem is that the html has this meta tag: `` But the page is actually `utf-8`...

Hi, there are some missing dependencies on master branch. If i try to use the shell, the follwoing packages are missing: - pymongo - couchdb - PyQt4

I tryed to use the shell command to test my xpaths, but it does'nt work. $ crawley shell http://somewebsite.com/index.html Traceback (most recent call last): File "/home/maik/.virtualenvs/crawley/bin/crawley", line 4, in manage()...

Implement a way to use a regex in the scraper's matching urls.

Create a crawley project that demostrate how to use the crawler's login and then scrape data behind sessioned pages.

Consider the posibility of make a simple webbrowser desktop application that allows the "end-users" scrape web pages with a GUI. This app should show the webpage to the user and...

Bumps [sqlalchemy](https://github.com/sqlalchemy/sqlalchemy) from 0.7.8 to 1.3.0. Release notes Sourced from sqlalchemy's releases. 1.3.0 Released: March 4, 2019 [feature] [schema] Added new parameters Table.resolve_fks and MetaData.reflect.resolve_fks which when set to False...

dependencies