scrapy-selenium icon indicating copy to clipboard operation
scrapy-selenium copied to clipboard

Prevent kwargs override on replace

Open sushinoya opened this issue 3 years ago • 1 comments

Background

In scrapy's Request class, the following function is defined -

    def replace(self, *args, **kwargs):
        """Create a new Request with the same attributes except for those
        given new values.
        """
        for x in ['url', 'method', 'headers', 'body', 'cookies', 'meta', 'flags',
                  'encoding', 'priority', 'dont_filter', 'callback', 'errback', 'cb_kwargs']:
            kwargs.setdefault(x, getattr(self, x))
        cls = kwargs.pop('cls', self.__class__)
        return cls(*args, **kwargs)

Since, SeleniumRequest inherits from Request, when .replace() is called upon a SelemiunRequest object, it defers to its super class. And as we can see in the snippet above, a new Request is constructed using only a select few attributes. These attributes do not include SeleniumRequest's additional ones such as wait_time and wait_until. Thus, after a replace call, these attributes are set to None which can lead to all sorts of errors and unexpected behavior.

This PR fixes that issue.

sushinoya avatar Apr 12 '21 16:04 sushinoya

Hi, this PR is very important for me, is it possible to review and merge? I tested in scrapy v1.6.0 and v2.9.0 and the issue persist :). I working many days to identified was happen.

gsi-luis avatar Jul 23 '23 21:07 gsi-luis