spidr icon indicating copy to clipboard operation
spidr copied to clipboard

A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Results 17 spidr issues
Sort by recently updated
recently updated
newest added

Add optional Logging/debug output to `Spidr::Agent`. `Agent#initialize` should accept a `logger` option for passing in custom [Logger](https://rubydoc.info/stdlib/logger/Logger) compatible objects. It should also support a `logging: true|false` option, which initializes `@logger`...

feature

Add methods/options for filtering URLs by path.

feature

Switch from using Ruby's `net/http` to using [async-http](https://github.com/socketry/async-http#readme). This would allow for easy connection pooling and concurrent requests, without the overhead of threads and mutexes.

improvement
http

Hi, It seems that when $_SERVER['REQUEST_URI'] or similar is used AND the web server is configured to return custom error pages (including 200 statuses), Spidr ends up in an infinite...

needs info

Howdy. We just had a big debugging session centered around redirects, and it turned out that they were redirecting from non-www to a www.domain URL, so spidr silently failed, finding...

feature

Howdy! Just wondering if i'm implementing this right. I need to follow redirects, and there doesnt seem to be an option toggle so I tried implementing it this way. It...

_Side note_: First of all thank you for an awesome gem. Over the past years and I've reached for this gem numerous times for various purposes big and small, its...

__Overview__ - Supports index files - Supports gzipped files - Tries common Sitemap XML locations - With `robots: true` will try to fetch sitemap locations from `/robots.txt` - Each found...

I'm opening this issue for the sole reason to say, thank you so much for your hard work 🙏. also as a side note, I run the specs against ruby...

Automatically detecting and parsing `/sitemap.xml` might be a good way to cut down on spidering depth.

feature