ibis icon indicating copy to clipboard operation
ibis copied to clipboard

feat(postgres): more flexible map type implementation

Open cpcloud opened this issue 1 year ago • 1 comments

Discussed in https://github.com/ibis-project/ibis/discussions/10483

Originally posted by augcollet November 13, 2024 Hello,

I need your help to resolve a specific problem...

From the following data with postgresql backend :

import ibis
from ibis import _
import os

con = ibis.postgres.connect(
    user=os.getenv('POSTGRES_USER'),
    password=os.getenv('POSTGRES_PASSWORD'),
    host="postgres",
    port=os.getenv('POSTGRES_PORT'),
    database=os.getenv('POSTGRES_DB'),
)
ibis.set_backend(con)

t=ibis.memtable({
    'client_id':[0,1,0,2,3,0,1,2,3],
    'product':['a', 'b', 'a', 'a', 'b', 'c', 'a', 'a', 'b'],
    'amount':[1.2, 2.5, 4.2, 12.7, 1.2, 3.8, 1.4, 3.8, 3],
})

image

I'm trying to perform the following calculation : image

I tried the following approach:

data=(
    t.group_by(['client_id', 'product'])
    .agg(
        sum_amount=_['amount'].sum()
    )
    .group_by(['client_id'])
    .agg(
        products_and_sum_amounts=ibis.map(
            _['product'].collect(),
            _['sum_amount'].collect()
        )
    )
)
data.execute()

I get the following error : image

It seems that ibis uses hstore to store data from a .map, which is incompatible with numeric values.

I have to cast the values ​​to a string before using .collect to get a result. image

How can I get around this? For example, how can I build a JSON object instead of MapValue?

( My goal is to exploit the resulting pandas dataset to use it with a DictVectorizer under sklearn. https://scikit-learn.org/1.5/modules/generated/sklearn.feature_extraction.DictVectorizer.html )

Thank you in advance for your support!

cpcloud avatar Nov 13 '24 10:11 cpcloud

Hello, is this supposed to be fixed in 10.5.0 release ? https://github.com/ibis-project/ibis/commit/ff6100aeebe8ad19d84a19451e29a2a374d96cd5

augcollet avatar May 13 '25 14:05 augcollet