chdb icon indicating copy to clipboard operation
chdb copied to clipboard

Read in process Python objects like Dataframe, Numpy or dict

Open auxten opened this issue 1 year ago • 1 comments

This PR is in very early stage. The implementation could change a lot for final patch.

Just hold this PR for other projects to tracking the progress of "chDB on Pandas/NumPy..."

Related issues:

  • https://github.com/chdb-io/chdb/issues?q=is%3Aissue+is%3Aopen+label%3AArrow
  • https://github.com/ibis-project/ibis/pull/8497

auxten avatar Apr 12 '24 09:04 auxten

Still working on it. Good news is the prototype worked. Python API example could be like this below. Any suggestion?

#!python3

import chdb


class myReader(chdb.PyReader):
    def __init__(self, data):
        self.data = data
        self.cursor = 0
        super().__init__(data)

    def read(self, col_names, count):
        # count ignored for demo
        if self.cursor >= len(self.data["a"]):
            return []
        block = [self.data[col] for col in col_names]
        self.cursor += len(block[0])
        return block


reader = myReader(
    {
        "a": [1, 2, 3, 4, 5, 6],
        "b": ["tom", "jerry", "auxten", "tom", "jerry", "auxten"],
    }
)

chdb.query("SELECT b, sum(a) FROM Python('reader') GROUP BY b", "debug").show()

Output:

"tom",5
"auxten",9
"jerry",7

auxten avatar Apr 29 '24 08:04 auxten