scrapy
scrapy copied to clipboard
Remove extra `spider` arguments from APIs where the spider is already available
As a continuation of #5090 we should simplify methods that take a spider argument but have access to a spider instance, usually via a crawler instance (though calling from_crawler() isn't yet mandatory in some built-in components as from_settings() will only be deprecated in the 2.13.0 but this just means we will do this in stages).
#6750 is a WIP part of this and I'm currently looking into the Scraper code. It's possible that in some public APIs this won't be possible without breakage but may be done when we add some newer (e.g. async) APIs.
There is a caveat with the crawler approach: crawler.spider is only set in Crawler.crawl(), but most of the actual core and component code should be executed after that.
Another interesting thing is various open_spider() / close_spider() methods, where it's usually not needed to pass the spider instance, but these need case by case reviews, as e.g. components managed by MiddlewareManager subclasses will need to keep the spider argument there until and unless we mandate from_crawler() for all those components.
Hey, I am new to coding and using Python, but I want to get more experience with VS Code. I am going to work on this!
@shmrl Are you working on this? If not, I need to contribute to an open-source project for a school assignment, and I would like to work on this. Thanks :)
@shmrl Are you working on this? If not, I need to contribute to an open-source project for a school assignment, and I would like to work on this. Thanks :)
I've been busy with school! Please go ahead!
Done, except for scrapy.logformatter.LogFormatter (which is possible to refactor but isn't a priority and shouldn't force this ticket to be kept open).