connector-x icon indicating copy to clipboard operation
connector-x copied to clipboard

Support JSONB fallback for polars

Open sslivkoff opened this issue 2 years ago • 2 comments

The problem

When loading postgres JSONB columns to polars, connectorx currently throws

RuntimeError: No conversion rule from JSONB(true) to connectorx::destinations::arrow::typesystem::ArrowTypeSystem.

Related: #121

The solution

For sqlite tables with JSON columns, connectorx simply loads these columns as strs without parsing them into JSON. This is nice fallback and it would be nice to have this fallback for postgresql as well instead of throwing an error.

This would be very helpful in situations where you are trying to load some set of rows that happen to have JSON columns. It would prevent the query from breaking and you could still decode the JSON text in the app if you need it.

Minimal example to reproduce

# create table and insert rows

import psycopg

connect_str = 'dbname={dbname} user={username}'

create_sql = 'CREATE TABLE IF NOT EXISTS example ( id BIGINT, name TEXT, note JSONB )'
insert_sql = 'INSERT INTO example VALUES (%s, %s, %s)'
insert_rows = [
    [1, 'first', psycopg.types.json.Jsonb({'version': 2})],
    [2, 'second', psycopg.types.json.Jsonb({'version': 3})],
    [3, 'third', psycopg.types.json.Jsonb({'version': 2})],
]
with psycopg.connect(connect_str) as conn:
    with conn.cursor() as cursor:
        cursor.execute(create_sql)
        cursor.executemany(insert_sql, insert_rows)
# query data with connectorx

import connectorx

result = connectorx.read_sql(
    conn='postgresql://{username}@localhost:5432/{dbname}',
    query='SELECT * FROM example',
    return_type='polars',
)

>>> RuntimeError: No conversion rule from JSONB(true) to connectorx::destinations::arrow::typesystem::ArrowTypeSystem.

sslivkoff avatar Jan 30 '23 20:01 sslivkoff

Polars now has a native "Binary" type. It seems that it would be a better fit, as well as more performant to just pass it in to the Binary type

universalmind303 avatar Feb 09 '23 05:02 universalmind303

The same happens with the redshift "super" type where usually json data is put into and at the moment gives an error of type not implemented.

marcosdetry avatar Feb 17 '23 22:02 marcosdetry