crawly icon indicating copy to clipboard operation
crawly copied to clipboard

[question] scrapping dynamic urls

Open mario-mazo opened this issue 3 years ago • 1 comments

Hello

Im thinking about using crawly for a project but im not sure whats is the best way to scrap dynamic urls

Like I need to scrap www.something.com/site/AAA all the way to www.something.com/site/ZZZ. The last AAA-ZZZ is a unique identifier

So should I pass the identifier that to start_urls? or should I fetch inside the parse_item

thanks

mario-mazo avatar Jul 05 '21 22:07 mario-mazo

There are two ways:

  1. adding them in start_urls
  2. Incrementally adding them in parse_item

which method depends on how many url permutations you are looking at. if the number is absurdly high (like hundreds of thousands) of urls, then go with method 2.

Ziinc avatar Sep 07 '21 17:09 Ziinc