scrapy-selenium
scrapy-selenium copied to clipboard
Prevent kwargs override on replace
Background
In scrapy
's Request
class, the following function is defined -
def replace(self, *args, **kwargs):
"""Create a new Request with the same attributes except for those
given new values.
"""
for x in ['url', 'method', 'headers', 'body', 'cookies', 'meta', 'flags',
'encoding', 'priority', 'dont_filter', 'callback', 'errback', 'cb_kwargs']:
kwargs.setdefault(x, getattr(self, x))
cls = kwargs.pop('cls', self.__class__)
return cls(*args, **kwargs)
Since, SeleniumRequest
inherits from Request
, when .replace()
is called upon a SelemiunRequest
object, it defers to its super class. And as we can see in the snippet above, a new Request
is constructed using only a select few attributes. These attributes do not include SeleniumRequest
's additional ones such as wait_time
and wait_until
. Thus, after a replace call, these attributes are set to None
which can lead to all sorts of errors and unexpected behavior.
This PR fixes that issue.
Hi, this PR is very important for me, is it possible to review and merge? I tested in scrapy v1.6.0 and v2.9.0 and the issue persist :). I working many days to identified was happen.