crawly
crawly copied to clipboard
[question] scrapping dynamic urls
Hello
Im thinking about using crawly for a project but im not sure whats is the best way to scrap dynamic urls
Like I need to scrap www.something.com/site/AAA all the way to www.something.com/site/ZZZ. The last AAA-ZZZ is a unique identifier
So should I pass the identifier that to start_urls? or should I fetch
inside the parse_item
thanks
There are two ways:
- adding them in
start_urls
- Incrementally adding them in
parse_item
which method depends on how many url permutations you are looking at. if the number is absurdly high (like hundreds of thousands) of urls, then go with method 2.