spidr icon indicating copy to clipboard operation
spidr copied to clipboard

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Results 17 spidr issues
Sort by recently updated
recently updated
newest added

Hi there, I was wondering if it would be possible to multithread the spidr gem? I don't know much about multithreading in ruby, but I believe only Ruby 1.9.x is...

feature

Add `get`, `head`, `post`, `put`, etc methods to `Spidr::Agent` for when you do not want a Page object returned, just the raw response.

feature

Currently `` tags are not taken into account and will send the spider to the wrong URL on pages with a base tag. With this patch, the spider correctly calculates...

I've just run into a situation where the reuse of an SSL session caused an exception and Spidr subsequently skipped the page. Currently, the exception is silently swallowed, so I...

To reduce lookup time in the `Spidr::Agent#queue`, we can store the URLs in a Hash of the unique `host:port` pair and the URL paths. This will also facilitate events for...

feature

Discussed in IRC the other day. Noting it here for posterity. Could look into using http://github.com/alexdunae/css_parser for this, although there may be a more efficient path. ``` parser = CssParser::Parser.new...

feature

Using `Addressable::URI` would allow spidr to handle IDN domains.

feature