spider icon indicating copy to clipboard operation
spider copied to clipboard

Spider is a Web spidering library for Ruby. It handles the robots.txt, scraping, collecting, and looping so that you can just handle the data.

Results 4 spider issues
Sort by recently updated
recently updated
newest added

Would it be possible to make Spider to be aware of the link's `rel` attribute such as `nofollow`?

The function ` generate_next_urls` scans every page, effectively downloading and loading *every* page into memory. This may not be a problem for small files, but it's completely inefficient, and makes...

I make a crawler who scan same URL. Here an example: ~~~ - https://www.jared.com/diamond-engagement-ring-78-carat-tw-roundcut-18k-white-gold/p/# - https://www.jared.com/diamond-engagement-ring-78-carat-tw-roundcut-18k-white-gold/p/#skiptonavigation - https://www.jared.com/diamond-engagement-ring-78-carat-tw-roundcut-18k-white-gold/p/#skip-to-content ~~~

Right now, it only parses HTML to get the URLs, and while I have written code that parses JS(both inside an HTML file, and in asset files), and gets all...