polars
polars copied to clipboard
fix(python): ensure pyarrow.compute module is loaded
fix(python): ensure pyarrow.compute module is loaded
Stumbled across a pyarrow lazy loading race condition where pa.compute functions may not be available just yet. It's difficult to test in the test suite since another test may have triggered the module to be fully loaded hiding the bug.
I believe the pyarrow docs recommend importing and using the compute module directly rather than depending on them to be loaded on the root package. This change adds an explicit lazy load dependency for that pyarrow.compute module.
Reproduction Steps
import pyarrow as pa
import pyarrow.feather as feather
col = pa.chunked_array([["foo"], ["bar"]], type=pa.dictionary(pa.int8(), pa.string()))
table = pa.table([col], names=["a"])
feather.write_feather(table, "example.ipc")
import polars as pl
# import pyarrow.compute # enable workaround
pl.read_ipc("example.ipc", use_pyarrow=True)
Traceback (most recent call last):
File "example.py", line 5, in <module>
pl.read_ipc("example.ipc", use_pyarrow=True)
File "polars/utils.py", line 394, in wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "polars/io.py", line 860, in read_ipc
df = DataFrame._from_arrow(tbl, rechunk=rechunk)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "polars/internals/dataframe/frame.py", line 470, in _from_arrow
return cls._from_pydf(arrow_to_pydf(data, columns=columns, rechunk=rechunk))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "polars/internals/construction.py", line 936, in arrow_to_pydf
column = coerce_arrow(column)
^^^^^^^^^^^^^^^^^^^^
File "polars/internals/construction.py", line 1105, in coerce_arrow
array = pa.compute.cast(
^^^^^^^^^^
File "polars/dependencies.py", line 82, in __getattr__
return getattr(module, attr)
^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/__init__.py", line 335, in __getattr__
raise AttributeError(
AttributeError: module 'pyarrow' has no attribute 'compute'
@alexander-beedie could you take a look if this still makes sense regarding the lazy loading?
@alexander-beedie could you take a look if this still makes sense regarding the lazy loading?
No problem; I have a block of time tomorrow afternoon 👍
could you take a look if this still makes sense regarding the lazy loading?
I guess another option would just putting the import pyarrow.compute right inline the coerce_arrow body since it's only ever used there.
could you take a look if this still makes sense regarding the lazy loading?
I guess another option would just putting the
import pyarrow.computeright inline thecoerce_arrowbody since it's only ever used there.
I like that more. Could you make this change?
All looks good to me; does seem that pyarrow wants that explicitly imported, but unless we're going to have more than one such import I think it's fine to special-case it and import inline.
Great! Thanks @josh and @alexander-beedie