zyte-smartproxy-headless-proxy icon indicating copy to clipboard operation
zyte-smartproxy-headless-proxy copied to clipboard

SSLError if trying to connect to Crawlera Headless Proxy with Python's urllib or requests

Open actionless opened this issue 6 years ago • 10 comments
trafficstars

  1. first i run crawlera's proxy locally with docker:
sudo docker run -ti -p 3128:3128 -p 3130:3130 scrapinghub/crawlera-headless-proxy -p 3128 -a "$CRAWLERA_API_KEY" -x profile=desktop
  1. next i run curl on some url using that proxy:
$ curl -x http://localhost:3128 -k https://www.google.com/

(and it works)

  1. but if i run the python prompt like that:
env 'HTTPS_PROXY=http://localhost:3128' python

it won't work:

>>> import requests
>>> r = requests.get('https://www.google.com/', verify=False)
...
SSLError: HTTPSConnectionPool(host='www.google.com', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError("bad handshake: SysCallError(-1, 'Unexpected EOF')")))

the problem reproduces only when trying to use Crawlera Headless Proxy, any other proxy server iss working just fine

actionless avatar Aug 17 '19 22:08 actionless

I believe it worth to address this issue to python libraries. I also can reproduce this error and the root cause is somewhere in how TLS handshake is managed by these libraries.

9seconds avatar Aug 27 '19 09:08 9seconds

we did some investigation previously, the root cause is what crawlera-headless-proxy denies any HTTP 1.0 connection (it checking http version just from the connection string)

so even just hardcoding it to HTTP 1.1 in python code helps, but it just feels a bit strange why the proxy server denying all http 1.0 connections

https://github.com/python/cpython/blob/master/Lib/http/client.py#L883

actionless avatar Aug 27 '19 20:08 actionless

I'm a little bit confused with your findings :/ Probably this is undocumented behavior in a library we use (https://github.com/valyala/fasthttp). Thanks, I gonna track this issue further

9seconds avatar Aug 28 '19 06:08 9seconds

hm, i see the issue with the same symptoms in the first message got already closed there: https://github.com/valyala/fasthttp/issues/16

actionless avatar Aug 28 '19 11:08 actionless

Any movement on this issue? It's a major blocker for me.

ghost avatar Feb 23 '21 03:02 ghost

@jjonte-berkeley i've described the workaround in one of the messages above, so it can't be technically "a blocker"

actionless avatar Feb 23 '21 11:02 actionless

@actionless Okay, thanks. Modifying CPython's base code is the solution. It is probably worth linking to the latest commit's hash on client.py so the line number stays relevant.

https://github.com/python/cpython/blob/711381dfb09fbd434cc3b404656f7fd306161a64/Lib/http/client.py#L904

ghost avatar Feb 23 '21 13:02 ghost

modifying cpython is a bit way too hardcore, you could just inherit that class and override there a _tunnel() method

(not working on that project involving using crawlera already for more than year though :) )

actionless avatar Feb 23 '21 20:02 actionless

I am also getting this error. Changing Python base code is not the solution for me. I could not understand how to use overridden _tunnel method as I directly use requests library. Is there some other workaround to fix this issue? Any help is appreciated.

anandsork avatar Jul 06 '21 06:07 anandsork

Iḿ having the same problem, even using curl method this error happen for me

guimap avatar Jul 29 '21 19:07 guimap