ibis icon indicating copy to clipboard operation
ibis copied to clipboard

bug(postgres): `to_pyarrow` fails with json type

Open turntable-justin opened this issue 1 year ago • 2 comments
trafficstars

What happened?

Hi there -- I'm trying to convert a postgres table to pyarrow and getting this error:

ArrowNotImplementedError: extension

I went into the ibis backend to look at what Ibis thinks the types are, and found this is what it thinks the array types are

struct<id: string not null, created_at: timestamp[us, tz=UTC], response: extension<ibis.json<JSONType>>, email: string>

And here is the ibis expression:

r0 := DatabaseTable: waitlist
  id         !uuid
  created_at timestamp('UTC')
  response   json
  email      string
Limit[r0, n=10000]

What version of ibis are you using?

8.0.0

What backend(s) are you using, if any?

Postgres

Relevant log output

--------------------------------------------------------------------------
ArrowNotImplementedError                  Traceback (most recent call last)
Cell In[20], line 1
----> 1 w.execution_helper().to_pyarrow()

File ~/Documents/GitHub/spoonbill/ibis/expr/types/core.py:444, in Expr.to_pyarrow(self, params, limit, **kwargs)
    416 @experimental
    417 def to_pyarrow(
    418     self,
   (...)
    422     **kwargs: Any,
    423 ) -> pa.Table:
    424     """Execute expression and return results in as a pyarrow table.
    425
    426     This method is eager and will execute the associated expression
   (...)
    442         A pyarrow table holding the results of the executed expression.
    443     """
--> 444     return self._find_backend(use_default=True).to_pyarrow(
    445         self, params=params, limit=limit, **kwargs
    446     )

File ~/Documents/GitHub/spoonbill/ibis/backends/base/__init__.py:367, in _FileIOHandler.to_pyarrow(self, expr, params, limit, **kwargs)
    363 arrow_schema = schema.to_pyarrow()
    364 with self.to_pyarrow_batches(
    365     table_expr, params=params, limit=limit, **kwargs
    366 ) as reader:
--> 367     table = pa.Table.from_batches(reader, schema=arrow_schema)
    369 return expr.__pyarrow_result__(
    370     table.rename_columns(table_expr.columns).cast(arrow_schema)
    371 )

File ~/Library/Caches/pypoetry/virtualenvs/vinyl-cwr2Pa_2-py3.11/lib/python3.11/site-packages/pyarrow/table.pxi:4104, in pyarrow.lib.Table.from_batches()

File ~/Library/Caches/pypoetry/virtualenvs/vinyl-cwr2Pa_2-py3.11/lib/python3.11/site-packages/pyarrow/ipc.pxi:666, in pyarrow.lib.RecordBatchReader.__next__()

File ~/Library/Caches/pypoetry/virtualenvs/vinyl-cwr2Pa_2-py3.11/lib/python3.11/site-packages/pyarrow/ipc.pxi:700, in pyarrow.lib.RecordBatchReader.read_next_batch()

File ~/Library/Caches/pypoetry/virtualenvs/vinyl-cwr2Pa_2-py3.11/lib/python3.11/site-packages/pyarrow/types.pxi:88, in pyarrow.lib._datatype_to_pep3118()

File ~/Documents/GitHub/spoonbill/ibis/backends/base/sql/__init__.py:245, in <genexpr>(.0)
    242 array_type = schema.as_struct().to_pyarrow()
    243 print(array_type)
    244 arrays = (
--> 245     pa.array(map(tuple, batch), type=array_type)
    246     for batch in self._cursor_batches(
    247         expr, params=params, limit=limit, chunk_size=chunk_size
    248     )
    249 )
    250 batches = map(pa.RecordBatch.from_struct_array, arrays)
    252 return pa.ipc.RecordBatchReader.from_batches(schema.to_pyarrow(), batches)

File ~/Library/Caches/pypoetry/virtualenvs/vinyl-cwr2Pa_2-py3.11/lib/python3.11/site-packages/pyarrow/array.pxi:344, in pyarrow.lib.array()

File ~/Library/Caches/pypoetry/virtualenvs/vinyl-cwr2Pa_2-py3.11/lib/python3.11/site-packages/pyarrow/array.pxi:42, in pyarrow.lib._sequence_to_array()

File ~/Library/Caches/pypoetry/virtualenvs/vinyl-cwr2Pa_2-py3.11/lib/python3.11/site-packages/pyarrow/error.pxi:154, in pyarrow.lib.pyarrow_internal_check_status()

File ~/Library/Caches/pypoetry/virtualenvs/vinyl-cwr2Pa_2-py3.11/lib/python3.11/site-packages/pyarrow/error.pxi:91, in pyarrow.lib.check_status()

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

turntable-justin avatar Feb 12 '24 18:02 turntable-justin

@cpcloud -- confirmed that this still exists with v. 9.0.0

turntable-justin avatar Feb 16 '24 14:02 turntable-justin

Yep, this is a problem in the postgres to_pyarrow implementation, which wasn't touched much in the latest big refactor.

cpcloud avatar Feb 16 '24 15:02 cpcloud

I think the underlying execute() function also uses to_pyarrow, so effectively the connector is blocked. Is there an alternative that will work? What is the ETA for fixing this?

turntable-justin avatar Feb 21 '24 16:02 turntable-justin

You should be able to use something like

pa.Table.from_pandas(expr.to_pandas())

If you definitely need a PyArrow Table.

cpcloud avatar Feb 21 '24 16:02 cpcloud

Not 100% sure what the ETA is. It will probably be in the next release, but no promises :)

cpcloud avatar Feb 21 '24 17:02 cpcloud

Ok will try that — thank you!

On Feb 21, 2024, at 11:02 AM, Phillip Cloud @.***> wrote:

Not 100% sure what the ETA is. It will probably be in the next release, but no promises :)

— Reply to this email directly, view it on GitHub https://github.com/ibis-project/ibis/issues/8318#issuecomment-1957315636, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5HSLR5UIOPIULYAOJDQVLLYUYSBLAVCNFSM6AAAAABDFFFD36VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJXGMYTKNRTGY. You are receiving this because you authored the thread.

turntable-justin avatar Feb 22 '24 06:02 turntable-justin