scrapy-html-storage
scrapy-html-storage copied to clipboard
Add configuration for saving HTML when URL meets a certain pattern
Hello @povilasb!
As the title describes, I was wondering if we could implement something like this?
The use case I have is that if a spider is crawling a site with lots of different pages with the crawl logic scattered into the different parts of the code, manually declaring the save_html
in the meta
might be tedious. I'm proposing of a way that perhaps if the middleware finds a certain pattern in the URL, it will save that HTML.
For example:
HTML_STORAGE = {
'save_on_url_patterns': [r'website.com/page\d+.html', 'website.com/section-\w+.html']
}
Let me know what you think and I could send a PR.
Cheers!