requests-html icon indicating copy to clipboard operation
requests-html copied to clipboard

Make pyppeteer use proxies

Open oldani opened this issue 6 years ago • 15 comments

If you're using proxies with requests-html and rendering JS sites is all good. Once you render a website pyppeteer don't know about this proxies and will expose your IP. This is an undesired behavior when scraping with proxies.

The idea is that whenever someone passes in proxies to the session object or any method call, make pyppeteer also use these proxies. #265

oldani avatar Feb 18 '19 14:02 oldani

This would be a good item to get fixed, currently when rendering I have to stop using proxy servers.

Bobspadger avatar Feb 26 '19 18:02 Bobspadger

I will take on this

oldani avatar Feb 27 '19 16:02 oldani

cool thanks, I was going to take a look later but I'm not up on the whole async thing yet :)

Bobspadger avatar Feb 27 '19 16:02 Bobspadger

I am in a very restrictive Coorporate Network and expiriencing many issues with Python and Proxies since the beginning of using requests-html. My goal is to scrape some cisco site, which has al lot of html returned by js - therefor I have to use the render functionality.

1st (solved manually) The initial Chromium Download of pyppeteer does not use proxies, so I had to download it manually and check where it expects to be:

python -c 'import pyppeteer; print(pyppeteer.chromium_downloader.chromiumExecutable)'

>>'win64': WindowsPath('C:/Users/XXX/AppData/Local/pyppeteer/pyppeteer/local-chromium/575458/chrome-win32/chrome.exe'

2nd (solved manually) Chromium does not accept Auth+Password given to --proxy-server="XXX" arg, see here

Now I am starting chromium with session = HTMLSession(browser_args=['--no-sandbox', '--proxy-pac-url="http://XXX/XXX.pac"']) while using the Proxy Auto Auth addon for chromium...

Start chrome.exe with the --proxy-pac-url="http://XXX/XXX.pac argument, enter your credentials and install the Proxy Auto Auth addon. Restart chrome.exe with the arguemts and check if you can use it without any proxy auth.

3rd (not solved yet) The render function does not use my proxy:

req = session.get(url=url, proxies=proxyDict, verify=False)
req.html.render()

pyppeteer.errors.PageError: net::ERR_NAME_NOT_RESOLVED at <URL>

I would be very happy if this can be solved ...

ep4devops avatar Apr 11 '19 11:04 ep4devops

+1 On this being an amazing thing to get resolved.

FlyingZebra1 avatar May 03 '19 20:05 FlyingZebra1

Are there any news about this issue? Scraping behind corporate proxies is impossible right now... Any planned progress on this? Thank you

predicador37 avatar Aug 22 '19 13:08 predicador37

Is there any news on this ? I saw this commit but don't know if it is the expected patch : https://github.com/psf/requests-html/pull/396

According to me, the best solution would be to be able to use proxies in the same way as requests do (from env or dict). Is it possible at this time ?

lauevrar77 avatar Jun 29 '20 05:06 lauevrar77

How is this going? I would like to know how I can use socks5 proxies with requests-html... and the .render() function.

MrIdjit avatar Jul 30 '20 13:07 MrIdjit

bump? any updates?

Bobspadger avatar Feb 08 '21 17:02 Bobspadger

bump

kiriharu avatar Oct 02 '21 10:10 kiriharu

bump

W-Booth avatar May 31 '22 14:05 W-Booth

any updates?

killerdevildog11 avatar Jul 04 '22 00:07 killerdevildog11

any updates?

andrewshrout avatar Aug 27 '22 22:08 andrewshrout

I have used selenium for alternative, however it is a lot slower

killerdevildog11 avatar Oct 05 '22 23:10 killerdevildog11