ibis icon indicating copy to clipboard operation
ibis copied to clipboard

feat: Collect MSSQL object into memory as an ibis object

Open KoderKow opened this issue 1 year ago • 3 comments

Is your feature request related to a problem?

From my research and going through the documentation, if I want to bring my ibis expr into memory, I need to execute() it. But I want to continue to use ibis expression after collecting. I wish there was a simple way to collect the ibis object into memory and keep it as an ibis object without having to execute (which makes it pandas) and then convert it back to an ibis object via memtable.

If there is already a straight way to do this, I apologize for missing it and would appreciate an example on how to do this. Currently I made a function that will execute and convert it to an ibis object.

What is the motivation behind your request?

I am either missing something or there are multiple steps I need to take to accomplish this task. I have not dug into if its extra compute to go from MSSQL -> pandas -> ibis (which I think is duckdb?).

Describe the solution you'd like

This would be ideal:

con = con_ibis(env)

# Selecting columns 
my_table = (
    con.table("my_table", database="dbo")
    .select(
        _.my_id,
        _.some_column
    )
)

my_table is server side at the moment, for my use case/project I want to bring it into memory, but retain the ibis properties so I can keep using the ibis API

my_table_in_memory = my_table.collect()

What version of ibis are you running?

9.5.0

What backend(s) are you using, if any?

mssql

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

KoderKow avatar Sep 24 '24 16:09 KoderKow

@KoderKow I think what you are looking for is .cache(), you can find an example in the docs https://ibis-project.org/reference/expression-tables.html#ibis.expr.types.relations.Table.cache

Would this do what you are looking for?

ncclementi avatar Sep 24 '24 17:09 ncclementi

Interesting I have not tried that. When I cache an expression, where is that saved? I see it makes SELECT * FROM [tempdb].[dbo].[##ibis_cached_d6rcpzztrfhcbaqpqijetubzdu], is that a temporary table in the database? I noticed once the connection closes the cache no longer works.

Is there a way to turn my con object that's mssql into an ibis polars by any chance? Currently I am collecting the data into memory from my ibis mssql object then creating an ibis polar object from that. I couldn't see if there is a way to update it with out the 2-3 extra steps.

KoderKow avatar Sep 24 '24 22:09 KoderKow

Hey @KoderKow -- when you cache an expression, it gets saved to a temporary table on the backend-engine, and those don't survive past the lifetime of the session (if you want an intermediate expression to persist, you can pass the expr to create_table).

Can you explain a little bit more about what you are trying to accomplish here? I think we're missing some context and it would help us to give you better advice.

gforsyth avatar Sep 25 '24 13:09 gforsyth

@KoderKow Depending on what you want to do, here are your options:

cache: persist in the database for the lifetime of the connection

Use this if you want to keep everything in the database. The cached table will be dropped on connection close.

to_polars: compute in the backend and return a polars object

Use this if you want to bring the results to the client.

To continue using the Ibis API, you'd call ibis.memtable(con.to_polars(some_expr)).

cpcloud avatar Nov 02 '24 12:11 cpcloud