spider issues

Follow vs nofollow links

Would it be possible to make Spider to be aware of the link's `rel` attribute such as `nofollow`?

The function ` generate_next_urls` scans every page, effectively downloading and loading *every* page into memory. This may not be a problem for small files, but it's completely inefficient, and makes...

apfeltee

Parse URL twice when found anchor

I make a crawler who scan same URL. Here an example: ~~~ - https://www.jared.com/diamond-engagement-ring-78-carat-tw-roundcut-18k-white-gold/p/# - https://www.jared.com/diamond-engagement-ring-78-carat-tw-roundcut-18k-white-gold/p/#skiptonavigation - https://www.jared.com/diamond-engagement-ring-78-carat-tw-roundcut-18k-white-gold/p/#skip-to-content ~~~

madeindjs

Is there a way to add own URLs down the road while crawling is taking place?

1

Right now, it only parses HTML to get the URLs, and while I have written code that parses JS(both inside an HTML file, and in asset files), and gets all...

arslanaly47

spider
spider copied to clipboard

Metadata

Follow vs nofollow links

Inefficient URL scanning

Parse URL twice when found anchor

Is there a way to add own URLs down the road while crawling is taking place?

← Metadata

Owner

Metadata

spider spider copied to clipboard

Metadata

Follow vs nofollow links

Inefficient URL scanning

Parse URL twice when found anchor

Is there a way to add own URLs down the road while crawling is taking place?

← Metadata

Owner

Metadata

spider
spider copied to clipboard