ibis
ibis copied to clipboard
feat: support UUIDs to pyarrow on more backends
partially fixes #8902.
Implements UUID execution to pyarrow on some backends, and adds notimpl tests for the rest.
OK, I think this brings up a larger philosophical question: Do we want to totally separate the pandas and pyarrow codepaths, or can they rely on each other?
Currently, to get pyarrow results from a backend:
- for some backends we go straight from the DB cursor object to pyarrow arrays, never needing pandas.
- In the backends that I touch in this PR, we go through the path of db_cursor -> pandas -> pyarrow.
I think the coupling between pandas and pyarrow for this conversion isn't inherently bad (we don't need to implement the db -> pyarrow path!), but I agree that it should be isolated, so we are very clear where we are mixing these two ecosystems, so that for the backends that don't need it, you can just have pyarrow installed, you don't need pandas.
So I see two options:
- keep this db_cursor -> pandas -> pyarrow path, but just sequester it into some 3rd module that is external to both ibis/formats/pandas.py and ibis/formats/pyarrow.py
- in these backends that don't have it yet, implement the db_cursor -> pyarrow conversion directly.
I think I would lean towards 2. I want to remove reliance on pandas as much as possible. Possibly this implementation won't be that hard for these other backends.
I think we'd to eventually be able to offer Ibis without requiring pyarrow
or pandas
, or least without requiring pandas
. Many systems are starting to have arrow-native endpoints that don't involve pandas, so db -> pyarrow
is actually better for those cases.
There's also the potential of using something that doesn't depend on either of those for the core (like printing tables), so I think we'd like to keep things as isolated from one another as possible.
Even more is the fact that sending anything through pandas is likely to result in some kind of type or value alteration that doesn't happen with pyarrow. Especially with NULLs, pandas is likely to do something completely different and incompatible with what pyarrow would do.
Ok, when I get back to this I'll try the db -> arrow method!
Is this PR still viable?
viable, I just stopped needing it personally so the urgency of it dropped a lot compared the 5 million other PRs I have open haha. Feel free to close if you want, and re-open once someone actually finds time to work on it.