spidr issues

Add Logging

Add optional Logging/debug output to `Spidr::Agent`. `Agent#initialize` should accept a `logger` option for passing in custom [Logger](https://rubydoc.info/stdlib/logger/Logger) compatible objects. It should also support a `logging: true|false` option, which initializes `@logger`...

postmodern

feature

Add ignore_paths and ignore_paths_like

Add methods/options for filtering URLs by path.

postmodern

feature

Switch to using async-http

1

Switch from using Ruby's `net/http` to using [async-http](https://github.com/socketry/async-http#readme). This would allow for easy connection pooling and concurrent requests, without the overhead of threads and mutexes.

postmodern

improvement

http

Infinite path loop

2

Hi, It seems that when $_SERVER['REQUEST_URI'] or similar is used AND the web server is configured to return custom error pages (including 200 statuses), Spidr ends up in an infinite...

ethicalhack3r

needs info

Redirect fails silently - need debug messages

4

Howdy. We just had a big debugging session centered around redirects, and it turned out that they were redirecting from non-www to a www.domain URL, so spidr silently failed, finding...

perplexes

feature

Following redirects

4

Howdy! Just wondering if i'm implementing this right. I need to follow redirects, and there doesnt seem to be an option toggle so I tried implementing it this way. It...

ZackMattor

Simple Command Line Interface (CLI)

3

_Side note_: First of all thank you for an awesome gem. Over the past years and I've reached for this gem numerous times for various purposes big and small, its...

buren

Sitemap XML support

2

__Overview__ - Supports index files - Supports gzipped files - Tries common Sitemap XML locations - With `robots: true` will try to fetch sitemap locations from `/robots.txt` - Each found...

buren

Thank you

1

I'm opening this issue for the sole reason to say, thank you so much for your hard work 🙏. also as a side note, I run the specs against ruby...

thegreyfellow

Automatically detect and parse sitemap.xml

7

Automatically detecting and parsing `/sitemap.xml` might be a good way to cut down on spidering depth.

postmodern

feature

spidr
spidr copied to clipboard

Metadata

Add Logging

Add ignore_paths and ignore_paths_like

Switch to using async-http

Infinite path loop

Redirect fails silently - need debug messages

Following redirects

Simple Command Line Interface (CLI)

Sitemap XML support

Thank you

Automatically detect and parse sitemap.xml

← Metadata

Owner

Metadata

spidr spidr copied to clipboard

Metadata

← Metadata

Owner

Metadata

spidr
spidr copied to clipboard