loky icon indicating copy to clipboard operation
loky copied to clipboard

loky fails for Python 3.8 when importing ipyparallel 6.2.5

Open basnijholt opened this issue 5 years ago • 9 comments
trafficstars

In this effort to support Loky for Adaptive (https://github.com/python-adaptive/adaptive/pull/263), we see that Loky fails in the CI for all Python 3.8 tests. See these builds logs.

I see in this PR https://github.com/joblib/loky/pull/232

We recently temporarily removed the Python 3.8 entries of the CI due to a failing test caused by a reference cycle in early Python 3.8 versions. Now that this bug is fixed upstream, we can skip the failing test on the appropriate Python versions where this bug exists, and restore the rest of the CI suite.

However, no cause is specified. "this bug is fixed upstream" Where upsteam?

The traceback:

E       RuntimeError: An error occured while evaluating "learner.function(-1.0)". See the traceback for details.:
E       
E       loky.process_executor._RemoteTraceback: 
E       '''
E       Traceback (most recent call last):
E         File "d:\a\1\s\.tox\py38-alldeps\lib\site-packages\loky\process_executor.py", line 391, in _process_worker
E           call_item = call_queue.get(block=True, timeout=timeout)
E         File "c:\hostedtoolcache\windows\python\3.8.2\x64\lib\multiprocessing\queues.py", line 116, in get
E           return _ForkingPickler.loads(res)
E         File "d:\a\1\s\.tox\py38-alldeps\lib\site-packages\ipyparallel\serialize\codeutil.py", line 24, in code_ctor
E           return types.CodeType(*args)
E       TypeError: an integer is required (got type bytes)
E       '''
E       
E       The above exception was the direct cause of the following exception:
E       
E       Traceback (most recent call last):
E         File "D:\a\1\s\adaptive\runner.py", line 193, in _process_futures
E           y = fut.result()
E         File "c:\hostedtoolcache\windows\python\3.8.2\x64\lib\concurrent\futures\_base.py", line 432, in result
E           return self.__get_result()
E         File "c:\hostedtoolcache\windows\python\3.8.2\x64\lib\concurrent\futures\_base.py", line 388, in __get_result
E           raise self._exception
E       loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.

These tests for not fail for Python 3.6 and 3.7. Finally, locally this test also passes for Python 3.8!

basnijholt avatar Apr 09 '20 19:04 basnijholt

Hi! Thank you for the report.

I believe the the quoted message and the loky PR you linked are unrelated to your problem. In the the traceback you posted, loky simply signals that the worker failed to unserialize a task. In particular, the ipyparallel reducer/reconstructor used to serialize code objects looks out of date (code construction semantics endured some breaking changes in Python 3.8 and PEP 570), and thus fails in Python 3.8.

The only worrying bit on the loky side is the fact ipyparallel is used to serialize code objects (loky should use cloudpickle instead, which supports PEP 570). To understand this I would need a MVCE. In any ways, feel free to also look for related (un)serialization bugs reports ipyparallel.

PS: by "upstream", I mean the CPython code base (https://github.com/python/cpython)

pierreglaser avatar Apr 09 '20 20:04 pierreglaser

Thanks for your detailed look!

I've been able to make a minimal example, where after creating a ipyparallel.Client the exception is raised with the following code:

from ipyparallel import Client

def linear(x):
    return x

import loky
loky_executor = loky.get_reusable_executor()
futs = loky_executor.map(linear, range(10))
list(futs)

Using MacOS and Python 3.8.

basnijholt avatar Apr 09 '20 21:04 basnijholt

The reproducer looks great, thanks. I'll investigate and see whether the fix should be on the loky end or on the ipyparallel end.

pierreglaser avatar Apr 09 '20 21:04 pierreglaser

I've simplified the above code, actually just importing ipyparallel will make loky fail!

basnijholt avatar Apr 09 '20 21:04 basnijholt

@pierreglaser, I am relatively sure that this is an ipyparallel problem.

It's fixed by https://github.com/ipython/ipyparallel/pull/379, which hasn't made it into a release yet, unfortunately.

It does, however, mean that loky is broken whenever ipyparallel is imported.

basnijholt avatar Apr 10 '20 09:04 basnijholt

It does, however, mean that loky is broken whenever ipyparallel is imported.

Yes, and it should not.

Are you sure cloudpickle is well installed on your system?

~I actually can reproduce locally only if cloudpickle is not installed.~ My bad, I actually can reproduce with cloudpickle installed.

pierreglaser avatar Apr 10 '20 09:04 pierreglaser

(py38) basnijholt-imac  ➜  ~  python
Python 3.8.2 | packaged by conda-forge | (default, Mar 23 2020, 17:55:48)
[Clang 9.0.1 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from loky.backend.reduction import get_loky_pickler_name
>>> print(get_loky_pickler_name())
cloudpickle

I don't think it's because of my environment. I've just tried it on a remote cluster with CentOS and get the same:

QUANTUM-NFS-SERVER-001  ➜  ~  python
Python 3.8.1 | packaged by conda-forge | (default, Jan 29 2020, 14:55:04)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from ipyparallel import Client

>>>
>>> def linear(x):
...     return x
...
>>> import loky
>>> loky_executor = loky.get_reusable_executor()
>>> futs = loky_executor.map(linear, range(10))
>>> list(futs)
loky.process_executor._RemoteTraceback:
'''
Traceback (most recent call last):
  File "/gscratch/home/a-banijh/miniconda3/envs/majoanalysis/lib/python3.8/site-packages/loky/process_executor.py", line 391, in _process_worker
    call_item = call_queue.get(block=True, timeout=timeout)
  File "/gscratch/home/a-banijh/miniconda3/envs/majoanalysis/lib/python3.8/multiprocessing/queues.py", line 116, in get
    return _ForkingPickler.loads(res)
  File "/gscratch/home/a-banijh/miniconda3/envs/majoanalysis/lib/python3.8/site-packages/ipyparallel/serialize/codeutil.py", line 24, in code_ctor
    return types.CodeType(*args)
TypeError: an integer is required (got type bytes)
'''

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/gscratch/home/a-banijh/miniconda3/envs/majoanalysis/lib/python3.8/site-packages/loky/process_executor.py", line 794, in _chain_from_iterable_of_lists
    for element in iterable:
  File "/gscratch/home/a-banijh/miniconda3/envs/majoanalysis/lib/python3.8/concurrent/futures/_base.py", line 611, in result_iterator
    yield fs.pop().result()
  File "/gscratch/home/a-banijh/miniconda3/envs/majoanalysis/lib/python3.8/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/gscratch/home/a-banijh/miniconda3/envs/majoanalysis/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.

basnijholt avatar Apr 10 '20 09:04 basnijholt

Thanks.

pierreglaser avatar Apr 10 '20 09:04 pierreglaser

Ok, I just realized we let copyreg-registered reducer override cloudpickle reducers. So if a module like ipyparallel registered faulty reducers in it, loky will fail. I'm not sure we want to change this behavior though.

pierreglaser avatar Apr 10 '20 10:04 pierreglaser