crawley issues

Documentation missing nosql info

I can't find anything in the documentation about how to use mongodb to save the crawled data. Am I missing something ?

Use urljoin to fix relative urls

1

https://docs.python.org/2/library/urlparse.html#urlparse.urljoin provides a robust way to make a relative url into a absolute one. This fixes some issues like this one: When accessing this url: http://www1.abracom.org.br/cms/opencms/abracom/pt/associados/ We find relative links...

onilton

Getting an Import error

1

Any ideas why I'm getting... ``` ImportError: cannot import name ScopedSession ```

graingerkid

Wrong encoding detection

I'm using PyQuery, and I get wrong encode detection for this page: http://www1.abracom.org.br/cms/opencms/abracom/pt/associados/resultado_busca.html?nomeArq=0148.html The problem is that the html has this meta tag: `` But the page is actually `utf-8`...

onilton

missing dependencies

5

Hi, there are some missing dependencies on master branch. If i try to use the shell, the follwoing packages are missing: - pymongo - couchdb - PyQt4

MrTango

shell does'nt work

1

I tryed to use the shell command to test my xpaths, but it does'nt work. $ crawley shell http://somewebsite.com/index.html Traceback (most recent call last): File "/home/maik/.virtualenvs/crawley/bin/crawley", line 4, in manage()...

MrTango

Matching urls via regex

Implement a way to use a regex in the scraper's matching urls.

jmg

Create an example project for Login.

Create a crawley project that demostrate how to use the crawler's login and then scrape data behind sessioned pages.

jmg

Desktop Client Application

Consider the posibility of make a simple webbrowser desktop application that allows the "end-users" scrape web pages with a GUI. This app should show the webpage to the user and...

jmg

Bump sqlalchemy from 0.7.8 to 1.3.0

Bumps [sqlalchemy](https://github.com/sqlalchemy/sqlalchemy) from 0.7.8 to 1.3.0. Release notes Sourced from sqlalchemy's releases. 1.3.0 Released: March 4, 2019 [feature] [schema] Added new parameters Table.resolve_fks and MetaData.reflect.resolve_fks which when set to False...

dependabot[bot]

dependencies

crawley
crawley copied to clipboard

Metadata

Documentation missing nosql info

Use urljoin to fix relative urls

Getting an Import error

Wrong encoding detection

missing dependencies

shell does'nt work

Matching urls via regex

Create an example project for Login.

Desktop Client Application

Bump sqlalchemy from 0.7.8 to 1.3.0

← Metadata

Owner

Metadata

crawley crawley copied to clipboard

Metadata

← Metadata

Owner

Metadata

crawley
crawley copied to clipboard