beam
beam copied to clipboard
[Bug]: apache-beam is unusable in recent python due to pinning an old dill library from 2019
What happened?
Hello,
apache-beam cannot be installed on any recent python environment because it is pinning an old version of dill from 2019.
pip install apache-beam>=2.57.0
...
The conflict is caused by:
apache-beam 2.59.0 depends on dill<0.3.2 and >=0.3.1.1
I have noticed apache-beam 2.57.0+ is required to allow pyarrow 15+, which is required by other recent tools/libraries.
It is impossible to install apache-beam on a recent python environments because all releases of apache-beam are pinning dill==0.3.1.1, which conflicts with other packages.
dill 0.3.1.1 was released in September 2019, it's extremely old. the latest python version at the time was python 3.7.
for reference the dill package did not provide official python wheel packages before v0.3.4 in June 2021. It needs custom compilation to be used.
https://pypi.org/project/dill/#history
Could you please remove the pinning of dill?
Correct to dill>=0.3.1.1 in this file
https://github.com/apache/beam/blame/master/sdks/python/setup.py#L348
The old comment is incorrect by the way. It was an early release 6 years ago when that comment was written. The serialization has stabilized since then.
# Dill doesn't have forwards-compatibility guarantees within minor
# version. Pickles created with a new version of dill may not unpickle
# using older version of dill. It is best to use the same version of
# dill on client and server, therefore list of allowed versions is
# very narrow. See: https://github.com/uqfoundation/dill/issues/341.
'dill>=0.3.1.1,<0.3.2',
Regards.
Issue Priority
Priority: 1 (data loss / total loss of function)
I am picking priority 1 total loss of function rating for the ticket, as being unable to install and use apache-beam is a total loss of function.
Issue Components
- [X] Component: Python SDK
You should be able to install the newer dill later by ignoring this conflict. If this causes any issue, you can also try cloudpickle .
You should be able to install the newer dill later by ignoring this conflict. If this causes any issue, you can also try cloudpickle .
Hi, with newer tools like uv and poetry it is often not possible to ignore these conflicts. This causes quite a few undesirable behaviors. e.g.
when depending on apache-beam and multiprocess at the same time, the dill pin implies multiprocess<=0.70.9, which is ancient and sdist only. Building the sdist i not possible as the dependency resolution in setuptool craps out.
Also interested in this being cleaned up - I'm happy to provide a PR if folks don't have time on your side.
Please check this doc: https://s.apache.org/beam-cloudpickle-next-steps
One potential area where we could use some help is to vendor cloudpickle into beam's codebase if anyone is interested.
cc: @claudevdm
This has not been resolved with Apache Beam 2.62.0.
We plan to address this in Q1.
Let's use https://github.com/apache/beam/issues/21298 for updates and tracking this work in one place.