dispy icon indicating copy to clipboard operation
dispy copied to clipboard

Problem with dispy and SSL

Open dismine opened this issue 7 years ago • 6 comments

Hi,

I cannot setup dispy to work with SSL certificate. It works only for node on the same machine, but not for remote nodes.

Here is command i used to create a certificate: openssl req -x509 -newkey rsa:4096 -sha256 -nodes -keyout private.key -out private.crt -days 3650

Then i merged two files: cat private.crt private.key > private.pem

I work on Ubuntu 16.04.

uname -a
Linux dismine 4.10.0-42-generic #46~16.04.1-Ubuntu SMP Mon Dec 4 15:57:59 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
python3 --version
Python 3.5.2
openssl version
OpenSSL 1.0.2g  1 Mar 2016

I call dispy from virtual environment:

amqp==2.2.2 billiard==3.5.0.3 boltons==17.1.0 celery==4.1.0 certifi==2017.11.5 chardet==3.0.4 concurrent-log-handler==0.9.7 coreapi==2.3.3 coreschema==0.0.4 dispy==4.8.3 Django==1.11.7 django-cleanup==1.0.1 django-dbbackup==3.2.0 djangorestframework==3.7.3 idna==2.6 itypes==1.1.0 Jinja2==2.10 kombu==4.1.0 lxml==4.1.1 Markdown==2.6.10 MarkupSafe==1.0 psutil==5.4.1 pycos==4.6.5 Pygments==2.2.0 pylibmc==1.5.2 python-memcached==1.58 pytz==2017.3 PyYAML==3.12 requests==2.18.4 six==1.11.0 uritemplate==3.0.0 urllib3==1.22 vine==1.1.4

Here is job i submit:

# function 'compute' is distributed and executed with arguments
# supplied with 'cluster.submit' below


def compute(n):
    import time, socket
    time.sleep(n)
    host = socket.gethostname()
    return (host, n)


if __name__ == '__main__':
    # executed on client only; variables created below, including modules imported,
    # are not available in job computations
    import dispy, random
    # distribute 'compute' to nodes; in this case, 'compute' does not have
    # any dependencies to run on nodes
    cluster = dispy.SharedJobCluster(compute,
                                     ip_addr=['127.0.0.1'],
                                     port=0,
                                     certfile='/home/dismine/project/private.pem')
    # run 'compute' with 20 random numbers on available CPUs
    jobs = []
    for i in range(5):
        job = cluster.submit(random.randint(5, 5))
        job.id = i # associate an ID to identify jobs (if needed later)
        jobs.append(job)
    # cluster.wait() # waits until all jobs finish
    for job in jobs:
        host, n = job() # waits for job to finish and returns results
        print('%s executed job %s at %s with %s' % (host, job.id, job.start_time, n))
        # other fields of 'job' that may be useful:
        # job.stdout, job.stderr, job.exception, job.ip_addr, job.end_time
    cluster.print_status()  # shows which nodes executed how many jobs etc.

Case that works for me:

dispynode.py -d --certfile=/home/dismine/project/private.pem --clean
dispyscheduler.py -d --node_certfile /home/dismine/project/private.pem --cluster_certfile /home/dismine/project/private.pem

But when i try add remote node, in my case on Windows 7, dispy crashes with two different errors depending on order of calling.

Commands i call:

python "C:\Program Files\Python3\Lib\site-packages\dispy\dispynode.py" -d --ip_addr 192.168.0.101 --certfile=C:\project\private.pem --clean
dispyscheduler.py -d --ip_addr 127.0.1.1 --ip_addr 192.168.0.103 --node_certfile /home/dismine/project/private.pem --cluster_certfile /home/dismine/project/private.pem

Case №1, dispyscheduler.py called first. If i call dispynode.py on Windows second i see many different errors:

C:\project>python "C:\Program Files\Python3\Lib\site-packages\dispy\dispynod
e.py" -d --ip_addr 192.168.0.101 --certfile=C:\project\private.pem --clean

Reading standard input disabled, as multiprocessing does not seem to workwith re
ading input under Windows
2018-01-03 12:50:08 dispynode - dispynode version: 4.8.3, PID: 3288
2018-01-03 12:50:08 pycos - version 4.6.5 with IOCP I/O notifier
2018-01-03 12:50:08 dispynode - "Роман-ПК" serving 2 cpus
2018-01-03 12:50:08 dispynode - TCP server at 192.168.0.101:51348
2018-01-03 12:50:08 pycos - uncaught exception in !tcp_req/36277304:
Traceback (most recent call last):
  File "C:\Program Files\Python3\lib\site-packages\pycos\__init__.py", line 3730, in _schedule
    retval = task._generator.throw(*exc)
  File "C:\Program Files\Python3\Lib\site-packages\dispy\dispynode.py", line 988, in tcp_req
    msg = yield conn.recv_msg()
  File "C:\Program Files\Python3\lib\site-packages\pycos\__init__.py", line 3732, in _schedule
    retval = task._generator.send(task._value)
  File "C:\Program Files\Python3\lib\site-packages\pycos\__init__.py", line 882,
 in _async_recv_msg
    data = yield self.recvall(n)
  File "C:\Program Files\Python3\lib\site-packages\pycos\__init__.py", line 1438, in _iocp_recvall
    self._read_result = win32file.AllocateReadBuffer(bufsize)
MemoryError
C:\project>python "C:\Program Files\Python3\Lib\site-packages\dispy\dispynod
e.py" -d --ip_addr 192.168.0.101 --certfile=C:\project\private.pem --clean

Reading standard input disabled, as multiprocessing does not seem to workwith re
ading input under Windows
2018-01-03 12:54:12 dispynode - dispynode version: 4.8.3, PID: 3284
2018-01-03 12:54:12 pycos - version 4.6.5 with IOCP I/O notifier
2018-01-03 12:54:12 dispynode - "Роман-ПК" serving 2 cpus
2018-01-03 12:54:12 dispynode - TCP server at 192.168.0.101:51348
2018-01-03 12:54:13 pycos - uncaught exception in !tcp_req/36342840:
Traceback (most recent call last):
  File "C:\Program Files\Python3\lib\site-packages\pycos\__init__.py", line 3730, in _schedule
    retval = task._generator.throw(*exc)
  File "C:\Program Files\Python3\Lib\site-packages\dispy\dispynode.py", line 988, in tcp_req
    msg = yield conn.recv_msg()
  File "C:\Program Files\Python3\lib\site-packages\pycos\__init__.py", line 3732, in _schedule
    retval = task._generator.send(task._value)
  File "C:\Program Files\Python3\lib\site-packages\pycos\__init__.py", line 882,
 in _async_recv_msg
    data = yield self.recvall(n)
  File "C:\Program Files\Python3\lib\site-packages\pycos\__init__.py", line 1438, in _iocp_recvall
    self._read_result = win32file.AllocateReadBuffer(bufsize)
OverflowError: Python int too large to convert to C long

In case №1 dispyscheduler.py doesn't show any error message.

Case №2. Calling dispyscheduler.py the second. In this case i see this error:

dispyscheduler.py -d --ip_addr 127.0.1.1 --ip_addr 192.168.0.103 --node_certfile /home/dismine/project/private.pem --cluster_certfile /home/dismine/project/private.pem
2018-01-03 12:59:54 dispyscheduler - dispyscheduler version 4.8.3
2018-01-03 12:59:54 pycos - version 4.6.5 with epoll I/O notifier
Enter "quit" or "exit" to terminate scheduler, anything else to get status: 2018-01-03 12:59:54 dispyscheduler - Scheduler at 192.168.0.103:51349
2018-01-03 12:59:54 dispyscheduler - TCP server at 127.0.1.1:51347
2018-01-03 12:59:54 dispyscheduler - TCP server at 192.168.0.103:51347
2018-01-03 12:59:54 dispyscheduler - Scheduler at 127.0.1.1:51349
2018-01-03 12:59:54 pycos - uncaught exception in !tcp_req/139968956623560:
Traceback (most recent call last):
  File "/home/dismine/project/env/lib/python3.5/site-packages/pycos/__init__.py", line 3730, in _schedule
    retval = task._generator.throw(*exc)
  File "/home/dismine/project/env/bin/dispyscheduler.py", line 368, in tcp_req
    msg = yield conn.recv_msg()
  File "/home/dismine/project/env/lib/python3.5/site-packages/pycos/__init__.py", line 3730, in _schedule
    retval = task._generator.throw(*exc)
  File "/home/dismine/project/env/lib/python3.5/site-packages/pycos/__init__.py", line 868, in _async_recv_msg
    data = yield self.recvall(n)
  File "/home/dismine/project/env/lib/python3.5/site-packages/pycos/__init__.py", line 461, in _recvall
    recvd = self._rsock.recv_into(view, len(view), *args)
  File "/usr/lib/python3.5/ssl.py", line 929, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.5/ssl.py", line 791, in read
    return self._sslobj.read(len, buffer)
  File "/usr/lib/python3.5/ssl.py", line 575, in read
    v = self._sslobj.read(len, buffer)
ssl.SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1977) 

In this case dispynode.py also doesn't show any error message.

This error i can reproduce also on CentOS 7 with python 3.5 and Windows server.

Worth to mention that it works if i left only SSL between cluster and scheduler.

Can you explain me where is my mistake?

dismine avatar Jan 03 '18 11:01 dismine

I have tested this with Linux node and scheduler and OS X client. It worked fine. I don't have a Windows computer to test as you used.

You can try different configurations to isolate the issue. For example, if you use all running Linux, scheduler and node running Linux but client running Windows etc., different SSL setups, such as node and scheduler using SSL but not client, and scheduler and client using SSL but not node etc. If you can isolate this to one combination, it may help.

I am also wondering if the issue may be different (incompatible?) SSL versions across CentOS and Windows in your setup.

pgiri avatar Jan 08 '18 03:01 pgiri

Hi,

I have just tested scheduler on Linux and node on Mac OS X and it works. Don't know why, but pycos shows error "fd (in my case 15 or 16) is not registered for reading!". But it still works. Do you know a way how to properly set up ssl for python on Windows? Looks like problem on Windows side.

dismine avatar Jan 08 '18 11:01 dismine

I have made more testing. dispy works on Windows, but only partially. When i don't use SSL at all it works. I can connect nodes on Windows and Linux. It also works when i use SSL for nodes, but only if they on Windows. Scheduler ignores connections from a Linux node. Also it doesn't work if secure connection between cluster and scheduler. The same errors as in my first post.

dismine avatar Jan 08 '18 15:01 dismine

I just tested previous versions, still the same result. SSL connection between scheduler on Windows and node on Linux doesn't work. For me it's look like incompatibility between Windows and Unix socket implementations.

dismine avatar Jan 09 '18 14:01 dismine

Probably we should redirect this issue to pycos. Because disabling I/O Completion Ports (IOCP) resolve the issue.

dismine avatar Jan 16 '18 09:01 dismine

Yes, apparently SSL with IOCP doesn't work. It seems to work if both sides are Windows with IOCP (I have only tested with both client and server on same Windows machine, so unknown if it works if the other side is remove). However, if the other side is not Windows, SSL doesn't work when used with IOCP.

pgiri avatar May 03 '18 01:05 pgiri