crawly [question] scrapping dynamic urls

[question] scrapping dynamic urls

Open mario-mazo opened this issue 3 years ago • 1 comments

Hello

Im thinking about using crawly for a project but im not sure whats is the best way to scrap dynamic urls

Like I need to scrap www.something.com/site/AAA all the way to www.something.com/site/ZZZ. The last AAA-ZZZ is a unique identifier

So should I pass the identifier that to start_urls? or should I fetch inside the parse_item

thanks

Jul 05 '21 22:07 mario-mazo

There are two ways:

adding them in start_urls
Incrementally adding them in parse_item

which method depends on how many url permutations you are looking at. if the number is absurdly high (like hundreds of thousands) of urls, then go with method 2.

Sep 07 '21 17:09 Ziinc

crawly crawly copied to clipboard

[question] scrapping dynamic urls

crawly
crawly copied to clipboard