spidr issues

Results 17 spidr issues

Sort by recently updated

Multithreading

Hi there, I was wondering if it would be possible to multithread the spidr gem? I don't know much about multithreading in ruby, but I believe only Ruby 1.9.x is...

ethicalhack3r

feature

Add low-level HTTP request methods

Add `get`, `head`, `post`, `put`, etc methods to `Spidr::Agent` for when you do not want a Page object returned, just the raw response.

postmodern

feature

Respect base tags

Currently `` tags are not taken into account and will send the spider to the wrong URL on pages with a base tag. With this patch, the spider correctly calculates...

ericmason

SSL session reuse may fail

I've just run into a situation where the reuse of an SSL session caused an exception and Spidr subsequently skipped the page. Currently, the exception is silently swallowed, so I...

nirvdrum

Store history queue in Hash of host:port and paths.

To reduce lookup time in the `Spidr::Agent#queue`, we can store the URLs in a Hash of the unique `host:port` pair and the URL paths. This will also facilitate events for...

postmodern

feature

Discussed in IRC the other day. Noting it here for posterity. Could look into using http://github.com/alexdunae/css_parser for this, although there may be a more efficient path. ``` parser = CssParser::Parser.new...

zapnap

feature

Switch to `Addressable::URI` for URI parsing

Using `Addressable::URI` would allow spidr to handle IDN domains.

postmodern

feature

spidr
spidr copied to clipboard

Metadata

Multithreading

Add low-level HTTP request methods

Respect base tags

SSL session reuse may fail

Store history queue in Hash of host:port and paths.

handle css @import spidering

Switch to `Addressable::URI` for URI parsing

← Metadata

Owner

Metadata

spidr spidr copied to clipboard

Metadata

Multithreading

Add low-level HTTP request methods

Respect base tags

SSL session reuse may fail

Store history queue in Hash of host:port and paths.

handle css @import spidering

Switch to `Addressable::URI` for URI parsing

← Metadata

Owner

Metadata

spidr
spidr copied to clipboard