ibis
ibis copied to clipboard
bug(postgres): `to_pyarrow` fails with json type
What happened?
Hi there -- I'm trying to convert a postgres table to pyarrow and getting this error:
ArrowNotImplementedError: extension
I went into the ibis backend to look at what Ibis thinks the types are, and found this is what it thinks the array types are
struct<id: string not null, created_at: timestamp[us, tz=UTC], response: extension<ibis.json<JSONType>>, email: string>
And here is the ibis expression:
r0 := DatabaseTable: waitlist
id !uuid
created_at timestamp('UTC')
response json
email string
Limit[r0, n=10000]
What version of ibis are you using?
8.0.0
What backend(s) are you using, if any?
Postgres
Relevant log output
--------------------------------------------------------------------------
ArrowNotImplementedError Traceback (most recent call last)
Cell In[20], line 1
----> 1 w.execution_helper().to_pyarrow()
File ~/Documents/GitHub/spoonbill/ibis/expr/types/core.py:444, in Expr.to_pyarrow(self, params, limit, **kwargs)
416 @experimental
417 def to_pyarrow(
418 self,
(...)
422 **kwargs: Any,
423 ) -> pa.Table:
424 """Execute expression and return results in as a pyarrow table.
425
426 This method is eager and will execute the associated expression
(...)
442 A pyarrow table holding the results of the executed expression.
443 """
--> 444 return self._find_backend(use_default=True).to_pyarrow(
445 self, params=params, limit=limit, **kwargs
446 )
File ~/Documents/GitHub/spoonbill/ibis/backends/base/__init__.py:367, in _FileIOHandler.to_pyarrow(self, expr, params, limit, **kwargs)
363 arrow_schema = schema.to_pyarrow()
364 with self.to_pyarrow_batches(
365 table_expr, params=params, limit=limit, **kwargs
366 ) as reader:
--> 367 table = pa.Table.from_batches(reader, schema=arrow_schema)
369 return expr.__pyarrow_result__(
370 table.rename_columns(table_expr.columns).cast(arrow_schema)
371 )
File ~/Library/Caches/pypoetry/virtualenvs/vinyl-cwr2Pa_2-py3.11/lib/python3.11/site-packages/pyarrow/table.pxi:4104, in pyarrow.lib.Table.from_batches()
File ~/Library/Caches/pypoetry/virtualenvs/vinyl-cwr2Pa_2-py3.11/lib/python3.11/site-packages/pyarrow/ipc.pxi:666, in pyarrow.lib.RecordBatchReader.__next__()
File ~/Library/Caches/pypoetry/virtualenvs/vinyl-cwr2Pa_2-py3.11/lib/python3.11/site-packages/pyarrow/ipc.pxi:700, in pyarrow.lib.RecordBatchReader.read_next_batch()
File ~/Library/Caches/pypoetry/virtualenvs/vinyl-cwr2Pa_2-py3.11/lib/python3.11/site-packages/pyarrow/types.pxi:88, in pyarrow.lib._datatype_to_pep3118()
File ~/Documents/GitHub/spoonbill/ibis/backends/base/sql/__init__.py:245, in <genexpr>(.0)
242 array_type = schema.as_struct().to_pyarrow()
243 print(array_type)
244 arrays = (
--> 245 pa.array(map(tuple, batch), type=array_type)
246 for batch in self._cursor_batches(
247 expr, params=params, limit=limit, chunk_size=chunk_size
248 )
249 )
250 batches = map(pa.RecordBatch.from_struct_array, arrays)
252 return pa.ipc.RecordBatchReader.from_batches(schema.to_pyarrow(), batches)
File ~/Library/Caches/pypoetry/virtualenvs/vinyl-cwr2Pa_2-py3.11/lib/python3.11/site-packages/pyarrow/array.pxi:344, in pyarrow.lib.array()
File ~/Library/Caches/pypoetry/virtualenvs/vinyl-cwr2Pa_2-py3.11/lib/python3.11/site-packages/pyarrow/array.pxi:42, in pyarrow.lib._sequence_to_array()
File ~/Library/Caches/pypoetry/virtualenvs/vinyl-cwr2Pa_2-py3.11/lib/python3.11/site-packages/pyarrow/error.pxi:154, in pyarrow.lib.pyarrow_internal_check_status()
File ~/Library/Caches/pypoetry/virtualenvs/vinyl-cwr2Pa_2-py3.11/lib/python3.11/site-packages/pyarrow/error.pxi:91, in pyarrow.lib.check_status()
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
@cpcloud -- confirmed that this still exists with v. 9.0.0
Yep, this is a problem in the postgres to_pyarrow implementation, which wasn't touched much in the latest big refactor.
I think the underlying execute() function also uses to_pyarrow, so effectively the connector is blocked. Is there an alternative that will work? What is the ETA for fixing this?
You should be able to use something like
pa.Table.from_pandas(expr.to_pandas())
If you definitely need a PyArrow Table.
Not 100% sure what the ETA is. It will probably be in the next release, but no promises :)
Ok will try that — thank you!
On Feb 21, 2024, at 11:02 AM, Phillip Cloud @.***> wrote:
Not 100% sure what the ETA is. It will probably be in the next release, but no promises :)
— Reply to this email directly, view it on GitHub https://github.com/ibis-project/ibis/issues/8318#issuecomment-1957315636, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5HSLR5UIOPIULYAOJDQVLLYUYSBLAVCNFSM6AAAAABDFFFD36VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJXGMYTKNRTGY. You are receiving this because you authored the thread.