dispy
dispy copied to clipboard
Problem with dispy and SSL
Hi,
I cannot setup dispy to work with SSL certificate. It works only for node on the same machine, but not for remote nodes.
Here is command i used to create a certificate:
openssl req -x509 -newkey rsa:4096 -sha256 -nodes -keyout private.key -out private.crt -days 3650
Then i merged two files:
cat private.crt private.key > private.pem
I work on Ubuntu 16.04.
uname -a
Linux dismine 4.10.0-42-generic #46~16.04.1-Ubuntu SMP Mon Dec 4 15:57:59 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
python3 --version
Python 3.5.2
openssl version
OpenSSL 1.0.2g 1 Mar 2016
I call dispy from virtual environment:
amqp==2.2.2 billiard==3.5.0.3 boltons==17.1.0 celery==4.1.0 certifi==2017.11.5 chardet==3.0.4 concurrent-log-handler==0.9.7 coreapi==2.3.3 coreschema==0.0.4 dispy==4.8.3 Django==1.11.7 django-cleanup==1.0.1 django-dbbackup==3.2.0 djangorestframework==3.7.3 idna==2.6 itypes==1.1.0 Jinja2==2.10 kombu==4.1.0 lxml==4.1.1 Markdown==2.6.10 MarkupSafe==1.0 psutil==5.4.1 pycos==4.6.5 Pygments==2.2.0 pylibmc==1.5.2 python-memcached==1.58 pytz==2017.3 PyYAML==3.12 requests==2.18.4 six==1.11.0 uritemplate==3.0.0 urllib3==1.22 vine==1.1.4
Here is job i submit:
# function 'compute' is distributed and executed with arguments
# supplied with 'cluster.submit' below
def compute(n):
import time, socket
time.sleep(n)
host = socket.gethostname()
return (host, n)
if __name__ == '__main__':
# executed on client only; variables created below, including modules imported,
# are not available in job computations
import dispy, random
# distribute 'compute' to nodes; in this case, 'compute' does not have
# any dependencies to run on nodes
cluster = dispy.SharedJobCluster(compute,
ip_addr=['127.0.0.1'],
port=0,
certfile='/home/dismine/project/private.pem')
# run 'compute' with 20 random numbers on available CPUs
jobs = []
for i in range(5):
job = cluster.submit(random.randint(5, 5))
job.id = i # associate an ID to identify jobs (if needed later)
jobs.append(job)
# cluster.wait() # waits until all jobs finish
for job in jobs:
host, n = job() # waits for job to finish and returns results
print('%s executed job %s at %s with %s' % (host, job.id, job.start_time, n))
# other fields of 'job' that may be useful:
# job.stdout, job.stderr, job.exception, job.ip_addr, job.end_time
cluster.print_status() # shows which nodes executed how many jobs etc.
Case that works for me:
dispynode.py -d --certfile=/home/dismine/project/private.pem --clean
dispyscheduler.py -d --node_certfile /home/dismine/project/private.pem --cluster_certfile /home/dismine/project/private.pem
But when i try add remote node, in my case on Windows 7, dispy crashes with two different errors depending on order of calling.
Commands i call:
python "C:\Program Files\Python3\Lib\site-packages\dispy\dispynode.py" -d --ip_addr 192.168.0.101 --certfile=C:\project\private.pem --clean
dispyscheduler.py -d --ip_addr 127.0.1.1 --ip_addr 192.168.0.103 --node_certfile /home/dismine/project/private.pem --cluster_certfile /home/dismine/project/private.pem
Case №1, dispyscheduler.py called first. If i call dispynode.py on Windows second i see many different errors:
C:\project>python "C:\Program Files\Python3\Lib\site-packages\dispy\dispynod
e.py" -d --ip_addr 192.168.0.101 --certfile=C:\project\private.pem --clean
Reading standard input disabled, as multiprocessing does not seem to workwith re
ading input under Windows
2018-01-03 12:50:08 dispynode - dispynode version: 4.8.3, PID: 3288
2018-01-03 12:50:08 pycos - version 4.6.5 with IOCP I/O notifier
2018-01-03 12:50:08 dispynode - "Роман-ПК" serving 2 cpus
2018-01-03 12:50:08 dispynode - TCP server at 192.168.0.101:51348
2018-01-03 12:50:08 pycos - uncaught exception in !tcp_req/36277304:
Traceback (most recent call last):
File "C:\Program Files\Python3\lib\site-packages\pycos\__init__.py", line 3730, in _schedule
retval = task._generator.throw(*exc)
File "C:\Program Files\Python3\Lib\site-packages\dispy\dispynode.py", line 988, in tcp_req
msg = yield conn.recv_msg()
File "C:\Program Files\Python3\lib\site-packages\pycos\__init__.py", line 3732, in _schedule
retval = task._generator.send(task._value)
File "C:\Program Files\Python3\lib\site-packages\pycos\__init__.py", line 882,
in _async_recv_msg
data = yield self.recvall(n)
File "C:\Program Files\Python3\lib\site-packages\pycos\__init__.py", line 1438, in _iocp_recvall
self._read_result = win32file.AllocateReadBuffer(bufsize)
MemoryError
C:\project>python "C:\Program Files\Python3\Lib\site-packages\dispy\dispynod
e.py" -d --ip_addr 192.168.0.101 --certfile=C:\project\private.pem --clean
Reading standard input disabled, as multiprocessing does not seem to workwith re
ading input under Windows
2018-01-03 12:54:12 dispynode - dispynode version: 4.8.3, PID: 3284
2018-01-03 12:54:12 pycos - version 4.6.5 with IOCP I/O notifier
2018-01-03 12:54:12 dispynode - "Роман-ПК" serving 2 cpus
2018-01-03 12:54:12 dispynode - TCP server at 192.168.0.101:51348
2018-01-03 12:54:13 pycos - uncaught exception in !tcp_req/36342840:
Traceback (most recent call last):
File "C:\Program Files\Python3\lib\site-packages\pycos\__init__.py", line 3730, in _schedule
retval = task._generator.throw(*exc)
File "C:\Program Files\Python3\Lib\site-packages\dispy\dispynode.py", line 988, in tcp_req
msg = yield conn.recv_msg()
File "C:\Program Files\Python3\lib\site-packages\pycos\__init__.py", line 3732, in _schedule
retval = task._generator.send(task._value)
File "C:\Program Files\Python3\lib\site-packages\pycos\__init__.py", line 882,
in _async_recv_msg
data = yield self.recvall(n)
File "C:\Program Files\Python3\lib\site-packages\pycos\__init__.py", line 1438, in _iocp_recvall
self._read_result = win32file.AllocateReadBuffer(bufsize)
OverflowError: Python int too large to convert to C long
In case №1 dispyscheduler.py doesn't show any error message.
Case №2. Calling dispyscheduler.py the second. In this case i see this error:
dispyscheduler.py -d --ip_addr 127.0.1.1 --ip_addr 192.168.0.103 --node_certfile /home/dismine/project/private.pem --cluster_certfile /home/dismine/project/private.pem
2018-01-03 12:59:54 dispyscheduler - dispyscheduler version 4.8.3
2018-01-03 12:59:54 pycos - version 4.6.5 with epoll I/O notifier
Enter "quit" or "exit" to terminate scheduler, anything else to get status: 2018-01-03 12:59:54 dispyscheduler - Scheduler at 192.168.0.103:51349
2018-01-03 12:59:54 dispyscheduler - TCP server at 127.0.1.1:51347
2018-01-03 12:59:54 dispyscheduler - TCP server at 192.168.0.103:51347
2018-01-03 12:59:54 dispyscheduler - Scheduler at 127.0.1.1:51349
2018-01-03 12:59:54 pycos - uncaught exception in !tcp_req/139968956623560:
Traceback (most recent call last):
File "/home/dismine/project/env/lib/python3.5/site-packages/pycos/__init__.py", line 3730, in _schedule
retval = task._generator.throw(*exc)
File "/home/dismine/project/env/bin/dispyscheduler.py", line 368, in tcp_req
msg = yield conn.recv_msg()
File "/home/dismine/project/env/lib/python3.5/site-packages/pycos/__init__.py", line 3730, in _schedule
retval = task._generator.throw(*exc)
File "/home/dismine/project/env/lib/python3.5/site-packages/pycos/__init__.py", line 868, in _async_recv_msg
data = yield self.recvall(n)
File "/home/dismine/project/env/lib/python3.5/site-packages/pycos/__init__.py", line 461, in _recvall
recvd = self._rsock.recv_into(view, len(view), *args)
File "/usr/lib/python3.5/ssl.py", line 929, in recv_into
return self.read(nbytes, buffer)
File "/usr/lib/python3.5/ssl.py", line 791, in read
return self._sslobj.read(len, buffer)
File "/usr/lib/python3.5/ssl.py", line 575, in read
v = self._sslobj.read(len, buffer)
ssl.SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1977)
In this case dispynode.py also doesn't show any error message.
This error i can reproduce also on CentOS 7 with python 3.5 and Windows server.
Worth to mention that it works if i left only SSL between cluster and scheduler.
Can you explain me where is my mistake?
I have tested this with Linux node and scheduler and OS X client. It worked fine. I don't have a Windows computer to test as you used.
You can try different configurations to isolate the issue. For example, if you use all running Linux, scheduler and node running Linux but client running Windows etc., different SSL setups, such as node and scheduler using SSL but not client, and scheduler and client using SSL but not node etc. If you can isolate this to one combination, it may help.
I am also wondering if the issue may be different (incompatible?) SSL versions across CentOS and Windows in your setup.
Hi,
I have just tested scheduler on Linux and node on Mac OS X and it works. Don't know why, but pycos shows error "fd
I have made more testing. dispy works on Windows, but only partially. When i don't use SSL at all it works. I can connect nodes on Windows and Linux. It also works when i use SSL for nodes, but only if they on Windows. Scheduler ignores connections from a Linux node. Also it doesn't work if secure connection between cluster and scheduler. The same errors as in my first post.
I just tested previous versions, still the same result. SSL connection between scheduler on Windows and node on Linux doesn't work. For me it's look like incompatibility between Windows and Unix socket implementations.
Probably we should redirect this issue to pycos. Because disabling I/O Completion Ports (IOCP) resolve the issue.
Yes, apparently SSL with IOCP doesn't work. It seems to work if both sides are Windows with IOCP (I have only tested with both client and server on same Windows machine, so unknown if it works if the other side is remove). However, if the other side is not Windows, SSL doesn't work when used with IOCP.