dask-ec2
dask-ec2 copied to clipboard
SSL: WRONG_VERSION_NUMBER + Ubuntu 16
Re-open of #38
I've been hacking away at this issue without much success. AWS now has a deep learning AMI for Ubuntu 16 that would save us a whole bunch of time, so I've been trying to figure out how to make this work. I'd be happy to open a pull request once I get things working, but I could use some direction.
What about Ubuntu 16 is different in how it handles certs that causes this?
What different configurations should I try that would make the problem more tractable?
Stack trace:
SSLError Traceback (most recent call last)
/home/ubuntu/dask-ec2/dask_ec2/cluster.py in get_pepper_client(self)
54 self._pepper = libpepper.Pepper(url, ignore_ssl_errors=True)
---> 55 self._pepper.login('saltdev', 'saltdev', 'pam')
56 except Exception:
/home/ubuntu/dask-ec2/dask_ec2/libpepper.py in login(self, username, password, eauth)
286 'password': password,
--> 287 'eauth': eauth}).get('return', [{}])[0]
288
/home/ubuntu/dask-ec2/dask_ec2/libpepper.py in req(self, path, data)
130 # con.verify_mode = ssl.CERT_NONE
--> 131 f = urlopen(req, context=con)
132 else:
/home/ubuntu/anaconda3/lib/python3.6/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
222 opener = _opener
--> 223 return opener.open(url, data, timeout)
224
/home/ubuntu/anaconda3/lib/python3.6/urllib/request.py in open(self, fullurl, data, timeout)
525
--> 526 response = self._open(req, data)
527
/home/ubuntu/anaconda3/lib/python3.6/urllib/request.py in _open(self, req, data)
543 result = self._call_chain(self.handle_open, protocol, protocol +
--> 544 '_open', req)
545 if result:
/home/ubuntu/anaconda3/lib/python3.6/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
503 func = getattr(handler, meth_name)
--> 504 result = func(*args)
505 if result is not None:
/home/ubuntu/anaconda3/lib/python3.6/urllib/request.py in https_open(self, req)
1360 return self.do_open(http.client.HTTPSConnection, req,
-> 1361 context=self._context, check_hostname=self._check_hostname)
1362
/home/ubuntu/anaconda3/lib/python3.6/urllib/request.py in do_open(self, http_class, req, **http_conn_args)
1320 raise URLError(err)
-> 1321 r = h.getresponse()
1322 except:
/home/ubuntu/anaconda3/lib/python3.6/http/client.py in getresponse(self)
1330 try:
-> 1331 response.begin()
1332 except ConnectionError:
/home/ubuntu/anaconda3/lib/python3.6/http/client.py in begin(self)
296 while True:
--> 297 version, status, reason = self._read_status()
298 if status != CONTINUE:
/home/ubuntu/anaconda3/lib/python3.6/http/client.py in _read_status(self)
257 def _read_status(self):
--> 258 line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
259 if len(line) > _MAXLINE:
/home/ubuntu/anaconda3/lib/python3.6/socket.py in readinto(self, b)
585 try:
--> 586 return self._sock.recv_into(b)
587 except timeout:
/home/ubuntu/anaconda3/lib/python3.6/ssl.py in recv_into(self, buffer, nbytes, flags)
1001 self.__class__)
-> 1002 return self.read(nbytes, buffer)
1003 else:
/home/ubuntu/anaconda3/lib/python3.6/ssl.py in read(self, len, buffer)
864 try:
--> 865 return self._sslobj.read(len, buffer)
866 except SSLError as x:
/home/ubuntu/anaconda3/lib/python3.6/ssl.py in read(self, len, buffer)
624 if buffer is not None:
--> 625 v = self._sslobj.read(len, buffer)
626 else:
SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:2178)
Maybe just need to update PyOpenSSL here: https://github.com/dask/dask-ec2/blob/24102d404696148cbd8a1e084614dac7276d047e/dask_ec2/salt.py#L190
Or provide valid certs here: https://github.com/dask/dask-ec2/blob/516b83d479066b4b510650b64c0b1864b43e4a6f/dask_ec2/templates/rest_cherrypy.conf#L3-L4
Also just hit this...our group is standardized on ubuntu 16 so going back to 14 is not a real option. Its unclear to me what the issue really is or how we could work around it. Any suggestions?
So far, I've tried:
- PyOpenSSL versions 16.2.0, 17.2.0, and 18.0
- Generating those certs manually
- Now I'm starting to translate the urllib requests to CuRL to see if I can get this to work at some level.
This is proving to be a larger problem for us because decent AMI packages for e.g. Tensorflow, CUDA are standardizing around 16.04, and 14.04 is increasingly problematically stale. Also, configuring those manually is quite time-intensive.
Update: Been digging around with the configuration of saltstack and trying to make any ssl-validated request work from localhost on the child node. I've been swapping in the requests library.
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.poolmanager import PoolManager
import requests
import ssl
class MyAdapter(HTTPAdapter):
# https://lukasa.co.uk/2013/01/Choosing_SSL_Version_In_Requests/
def init_poolmanager(self, connections, maxsize, block=false):
self.poolmanager = poolmanager(num_pools=connections,
maxsize=maxsize,
block=block,
ssl_version=ssl.protocol_tls)
class MyAdapter(HTTPAdapter):
def init_poolmanager(self, connections, maxsize, block=False):
self.poolmanager = PoolManager(num_pools=connections,
maxsize=maxsize,
block=block,
cert_file="/etc/pki/tls/certs/localhost.key",
ca_certs="/etc/pki/tls/certs/localhost.crt",
cert_reqs="CERT_REQUIRED",
ssl_version=ssl.PROTOCOL_TLSv1_2)
s = requests.Session()
s.mount("https://", MyAdapter())
url = "https://localhost:8000/login"
headers = {
'Accept': 'application/json',
'Content-Type': 'application/json',
'X-Requested-With': 'XMLHttpRequest',
}
req = s.get(url, headers=headers, verify="/etc/pki/tls/certs/localhost.crt", auth=("saltdev", "saltdev"))
Still returns SSLError: [SSL: WRONG_VERSION_NUMBER]
.
OpenSSL investigations: > openssl s_client -connect localhost:8000
Returns, among other things:
New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-GCM-SHA384
Sorry, but what is "localhost:8000" here and how is it related to EC2 or Amazon?
So after a lot of digging, etc, I realised that this issue is probably the basis of the problem:
So to check if this is indeed the problem; on the server (on AWS) I uninstalled salt, downgraded cherrypy to version 3.2.3 and then reinstalled salt* (then rebooted for good measure):
sudo apt-get remove salt-api
sudo pip uninstall cherrypy
sudo pip install cherrypy==3.2.3
sudo apt-get install salt-api
I could test this using the openssl command;
openssl s_client -connect 54.194.146.93:8000 -debug
previous output:
read from 0x17bcdb0 [0x17e7993] (5 bytes => 5 (0x5))
0000 - 48 54 54 50 2f HTTP/
write to 0x17bcdb0 [0x17ebee3] (31 bytes => 31 (0x1F))
0000 - 15 03 03 00 1a e5 a0 62-98 dd 8a e6 6f 02 b8 08 .......b....o...
0010 - 6b 9d eb a2 bf 8b ff aa-88 ec 0d dd 77 97 94 k...........w..
140689769182872:error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number:s3_pkt.c:365:
write to 0x17bcdb0 [0x17ebee3] (31 bytes => 31 (0x1F))
0000 - 15 03 03 00 1a e5 a0 62-98 dd 8a e6 70 1b 91 02 .......b....p...
0010 - 35 c3 43 89 bb bd d7 e9-d8 41 c4 48 08 32 47 5.C......A.H.2G
output with change on server:
read from 0xc0cdb0 [0xc37993] (5 bytes => 0 (0x0))
read:errno=0
write to 0xc0cdb0 [0xc3bee3] (31 bytes => 31 (0x1F))
0000 - 15 03 03 00 1a 32 b1 57-5e ee 5e 4b 0a 2e 2d ec .....2.W^.^K..-.
0010 - a6 ca a5 eb c9 e9 ce 10-f5 f8 a5 d2 2b 07 66 ............+.f
I guess that means it's working?
I copied a code snippet from the libpepper.py
file in dask_ec2
, to reproduce the error:
import ssl
from urllib.request import HTTPHandler, Request, urlopen, install_opener, build_opener
from urllib.error import HTTPError, URLError
import urllib.parse as urlparse
con = ssl.SSLContext(ssl.PROTOCOL_SSLv23)
req = Request('https://54.194.146.93:8000/login')
urlopen(req,context=con)
before this would produce the
SSLV3_ALERT_HANDSHAKE_FAILURE or wrong version number errors. But now doesn't fail:
<Response [200]>
The only problem is how to have the ubuntu install etc with the older version of cherrypy. I think for myself I'll build a xenial image on AWS with this corrected? Hopefully I can point dask at that? If my understanding of the above is correct and I'm right in my conclusions and fix, maybe building an appropriate image in each region is the way to go? (at least until salt is fixed).
Hopefully the above is useful - sorry if I'm wrong! Hopefully it's useful anyway :)
*warning: I don't know if there are security related bugs in cherrypy that I could be reintroducing here?
edit: I altered dask_ec2/salt.py
, and just told it to pip install the 3.2.3 version of cherrypy...
@retry(retries=3, wait=0)
def __install_salt_rest_api():
cmd = "pip install cherrypy==3.2.3"
ret = master.exec_command(cmd, sudo=True)
if ret["exit_code"] != 0:
raise Exception(ret["stderr"].decode('utf-8'))
I think this now works with ubuntu 16.04, without any other changes.
It could do with some testing from other people - e.g. on different versions of ubuntu or using different images, etc.
I've documented my installation procedure etc here, if it's useful!
@lionfish0 , thanks for your fixes. I still have some issue with your procedure actually (see below)
Installing scheduler
+---------+----------------------+-----------------+
| Node ID | # Successful actions | # Failed action |
+=========+======================+=================+
| node-0 | 19 | 4 |
+---------+----------------------+-----------------+
Failed states for 'node-0'
file | dask-scheduler.conf | /etc/supervisor/conf.d//dask-scheduler.conf | managed: One or more requisite failed: dask.distributed.correct_perms
file | correct_perms | /opt/anaconda/ | directory: An exception occurred in this state: Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/salt/state.py", line 1878, in call
**cdata['kwargs'])
File "/usr/lib/python2.7/dist-packages/salt/loader.py", line 1823, in wrapper
return f(*args, **kwargs)
File "/usr/lib/python2.7/dist-packages/salt/states/file.py", line 3098, in directory
full, ret, user, group, file_mode, None, follow_symlinks)
File "/usr/lib/python2.7/dist-packages/salt/modules/file.py", line 4397, in check_perms
perms['lattrs'] = ''.join(lsattr(name).get('name', ''))
File "/usr/lib/python2.7/dist-packages/salt/modules/file.py", line 552, in lsattr
raise SaltInvocationError("File or directory does not exist.")
SaltInvocationError: File or directory does not exist.
cmd | dask-scheduler-update-supervisor | /usr/bin/supervisorctl -c /etc/supervisor/supervisord.conf update && sleep 2 | wait: One or more requisite failed: dask.distributed.scheduler.dask-scheduler.conf
supervisord | dask-scheduler-running | dask-scheduler | running: One or more requisite failed: dask.distributed.scheduler.dask-scheduler-update-supervisor, dask.distributed.correct_perms, dask.distributed.scheduler.dask-scheduler.conf
I've started having the same problem too - I think something else has been updated which has caused the above new error.
As it says on the dask-ec2 readme, this project's now deprecated - and so I didn't try fixing the new bug. I tried for a while using kubernetes, but it's quite a pain to set up (not well documented yet maybe) and is serious overkill for what I want. So instead...
I've written a replacement for dask-ec2, I've called daskec2lite.
It needs a little bit more work but is nearly finished - I'll hopefully have some time later in the year to get it to a more 'release' state, but feel free to use it (it currently just makes spot instances, and there's probably other limitations, but hopefully it'll be useful to you). Feel free to add issues/feature-requests or pull requests.
If you think that daskec2lite is a good replacement for dask-ec2 I recommend making it more visible first by raising an issue that asks people to investigate it, and then perhaps with a PR to the README
On Wed, May 9, 2018 at 10:13 AM, Mike Smith [email protected] wrote:
I've started having the same problem too - I think something else has been updated which has caused the above new error.
As it says on the dask-ec2 readme, this project's now deprecated - and so I didn't try fixing the new bug. I tried for a while using kubernetes, but it's quite a pain to set up (not well documented yet maybe) and is serious overkill for what I want. So instead...
I've written a replacement for dask-ec2, I've called daskec2lite https://github.com/lionfish0/daskec2lite.
It needs a little bit more work but is nearly finished - I'll hopefully have some time later in the year to get it to a more 'release' state, but feel free to use it (it currently just makes spot instances, and there's probably other limitations, but hopefully it'll be useful to you). Feel free to add issues/feature-requests or pull requests.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/dask-ec2/issues/98#issuecomment-387752821, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszOoAGDMJKTDvYB9pn3a-UJ1Olo_4ks5twvmTgaJpZM4Oym5U .
I wasn't sure if my failure to use kubernetes etc was just my own incompetence, but I was in a hurry and I needed something - so quickly cobbled together daskec2lite. I'm not sure if it's the best path for people to go down (presumably something that is more cross-cloud-platform would be better), and it needs a little bit more work before advising lots of people to use it. Maybe depending on feedback from a few users I'll see if it's worth finishing and supporting properly... @jpoullet2000 if you do try it - please let me know what works/doesn't.
Thanks @mrocklin, if I go ahead with it as a proper project, I'll make a PR to your README in late June (by then I'll have fixed bugs etc). Great work with dask etc, btw. Thanks!
Thx. I'll have a look and let you know.
On 2018-05-09 16:49, Mike Smith wrote:
I wasn't sure if my failure to use kubernetes etc was just my own incompetence, but I was in a hurry and I needed something - so quickly cobbled together daskec2lite. I'm not sure if it's the best path for people to go down (presumably something that is more cross-cloud-platform would be better), and it needs a little bit more work before advising lots of people to use it. Maybe depending on feedback from a few users I'll see if it's worth finishing and supporting properly... @jpoullet2000 https://github.com/jpoullet2000 if you do try it - please let me know what works/doesn't.
Thanks @mrocklin https://github.com/mrocklin, if I go ahead with it as a proper project, I'll make a PR to your README in late June (by then I'll have fixed bugs etc). Great work with dask etc, btw. Thanks!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask-ec2/issues/98#issuecomment-387764889, or mute the thread https://github.com/notifications/unsubscribe-auth/AApyHn_D-BIfbSJGHJfXEFQnl4yacfwfks5twwHwgaJpZM4Oym5U.
After a quick test here is the error I get
(dasklite) jbp@jbp-XPS-L521X:~$ daskec2lite --pathtokeyfile ~/.ssh/datascience.pem --keyname datascience --username ubuntu --numinstances 2 --instancetype c4.2xlarge --region eu-west-1 --imageid ami-c8b51fb1 --wpi 2 --sgid sg-c18336bc --spotprice 3
Traceback (most recent call last):
File "/home/jbp/miniconda3/envs/dasklite/bin/daskec2lite", line 11, in <module>
sys.exit(main())
File "/home/jbp/miniconda3/envs/dasklite/lib/python3.6/site-packages/daskec2lite/daskec2lite.py", line 180, in main
imageid=args.imageid,keyname=args.keyname,spotprice=args.spotprice,region_name=args.region_name)
File "/home/jbp/miniconda3/envs/dasklite/lib/python3.6/site-packages/daskec2lite/daskec2lite.py", line 28, in start_cluster
'SecurityGroupIds': [ sgid ]
File "/home/jbp/miniconda3/envs/dasklite/lib/python3.6/site-packages/botocore/client.py", line 314, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/jbp/miniconda3/envs/dasklite/lib/python3.6/site-packages/botocore/client.py", line 612, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidGroup.NotFound) when calling the RequestSpotInstances operation: The security group 'sg-9146afe9' does not exist in VPC 'vpc-a72c1ec0'
As this is for a different project, I've copied the issue over, thanks @jpoullet2000!
@jpoullet2000 by the way, the bug you describe should now be fixed in daskec2lite.