scrapy-pyppeteer Retrying request download causes new pages to be opened

Retrying request download causes new pages to be opened

Open nichoi opened this issue 3 years ago • 1 comments

Hi, thanks for the useful project.

I have a middleware that retries a request up to X times, each time using a new proxy. This means _download_request is called X times, and as a result, X pages are created.

I am using a forked version of your project to close all pages before opening a new one. Wondering what you think of this solution / would you be interested in a contribution?

Also wondering if it would be possible to reuse the same page? New to pyppeteer.

https://github.com/TeamHG-Memex/scrapy-rotating-proxies

Nov 30 '20 09:11 nichoi

Hi, thanks for the interest in this project. Is this an issue of performance, memory usage, or both? Or something else? To be honest I didn't think this would be a problem, pages are relatively short-lived, as they're closed right after response content is read. Closing all pages before opening a new one doesn't sound right from a concurrency standpoint, I'd be more inclined to add a way to request the handler to reuse a certain page. I'd be interested in seeing how you modified the handler, though. Thanks again!

Dec 02 '20 16:12 elacuesta

scrapy-pyppeteer scrapy-pyppeteer copied to clipboard

Retrying request download causes new pages to be opened

scrapy-pyppeteer
scrapy-pyppeteer copied to clipboard