beam icon indicating copy to clipboard operation
beam copied to clipboard

[Bug]: apache-beam is unusable in recent python due to pinning an old dill library from 2019

Open morotti opened this issue 1 year ago • 1 comments

What happened?

Hello,

apache-beam cannot be installed on any recent python environment because it is pinning an old version of dill from 2019.

pip install apache-beam>=2.57.0
...
The conflict is caused by:
    apache-beam 2.59.0 depends on dill<0.3.2 and >=0.3.1.1

I have noticed apache-beam 2.57.0+ is required to allow pyarrow 15+, which is required by other recent tools/libraries.

It is impossible to install apache-beam on a recent python environments because all releases of apache-beam are pinning dill==0.3.1.1, which conflicts with other packages. dill 0.3.1.1 was released in September 2019, it's extremely old. the latest python version at the time was python 3.7. for reference the dill package did not provide official python wheel packages before v0.3.4 in June 2021. It needs custom compilation to be used. https://pypi.org/project/dill/#history

image

Could you please remove the pinning of dill? Correct to dill>=0.3.1.1 in this file https://github.com/apache/beam/blame/master/sdks/python/setup.py#L348

The old comment is incorrect by the way. It was an early release 6 years ago when that comment was written. The serialization has stabilized since then.

          # Dill doesn't have forwards-compatibility guarantees within minor
          # version. Pickles created with a new version of dill may not unpickle
          # using older version of dill. It is best to use the same version of
          # dill on client and server, therefore list of allowed versions is
          # very narrow. See: https://github.com/uqfoundation/dill/issues/341.
          'dill>=0.3.1.1,<0.3.2',

Regards.

Issue Priority

Priority: 1 (data loss / total loss of function)

I am picking priority 1 total loss of function rating for the ticket, as being unable to install and use apache-beam is a total loss of function.

Issue Components

  • [X] Component: Python SDK

morotti avatar Oct 17 '24 17:10 morotti

You should be able to install the newer dill later by ignoring this conflict. If this causes any issue, you can also try cloudpickle .

liferoad avatar Oct 18 '24 17:10 liferoad

You should be able to install the newer dill later by ignoring this conflict. If this causes any issue, you can also try cloudpickle .

Hi, with newer tools like uv and poetry it is often not possible to ignore these conflicts. This causes quite a few undesirable behaviors. e.g.

when depending on apache-beam and multiprocess at the same time, the dill pin implies multiprocess<=0.70.9, which is ancient and sdist only. Building the sdist i not possible as the dependency resolution in setuptool craps out.

chebbyChefNEQ avatar Dec 06 '24 15:12 chebbyChefNEQ

Also interested in this being cleaned up - I'm happy to provide a PR if folks don't have time on your side.

l1n avatar Dec 31 '24 00:12 l1n

Please check this doc: https://s.apache.org/beam-cloudpickle-next-steps

liferoad avatar Dec 31 '24 00:12 liferoad

One potential area where we could use some help is to vendor cloudpickle into beam's codebase if anyone is interested.

tvalentyn avatar Jan 02 '25 21:01 tvalentyn

cc: @claudevdm

tvalentyn avatar Jan 02 '25 21:01 tvalentyn

This has not been resolved with Apache Beam 2.62.0.

lexiehollis avatar Jan 23 '25 22:01 lexiehollis

We plan to address this in Q1.

liferoad avatar Jan 24 '25 01:01 liferoad

Let's use https://github.com/apache/beam/issues/21298 for updates and tracking this work in one place.

tvalentyn avatar Mar 18 '25 23:03 tvalentyn