lonboard icon indicating copy to clipboard operation
lonboard copied to clipboard

Remove pyarrow as hard dependency

Open kylebarron opened this issue 1 year ago • 1 comments

Is your feature request related to a problem? Please describe.

pyarrow is a massive, monolithic dependency. It can be hard to install in some places, and can't currently be installed in Pyodide. It's certainly a monumental effort to get it to work in Pyodide, but I think it would be valuable for lonboard to wean off of pyarrow.

The core enabling factor here is the Arrow PyCapsule Interface. It allows Python Arrow libraries to exchange Arrow data at the C level at no cost. This means that we can interface at no cost with any user who's already using pyarrow, but not be required to use pyarrow ourselves. I've been promoting its use throughout the Python Arrow ecosystem (https://github.com/apache/arrow/issues/39195#issuecomment-2245718008), and hoping this grows into something as core to tabular data processing as the buffer protocol is to numpy.

As part of working to build the ecosystem, I created arro3, a new, very minimal Python Arrow implementation that wraps the Rust Arrow implementation.

I think that it should be possible to swap out pyarrow for arro3, which is about 1% of the normal pyarrow installation size.

It's also symbiotic for the ecosystem if Lonboard shows the benefits of modular Arrow libraries in Python.

Describe the solution you'd like

We'll keep pyarrow as a required dependency for GeoPandas/Pandas interop. pyarrow has implemented pyarrow.Table.from_pandas and that's not something I want to even think about replicating.

But aside from that, pretty much everything is doable in arro3 and geoarrow-rust.

CLI only:

Other notes:

  • Add numpy as direct dependency

kylebarron avatar Jul 24 '24 21:07 kylebarron

Primarily closed by #582

kylebarron avatar Aug 08 '24 21:08 kylebarron

This will be closed with https://github.com/developmentseed/lonboard/pull/598 and #601

kylebarron avatar Aug 19 '24 17:08 kylebarron