dask-sql icon indicating copy to clipboard operation
dask-sql copied to clipboard

[ENH] Handle unproxying dask-cuda proxy_object returns from sqlmagic

Open randerzander opened this issue 3 years ago • 1 comments

I'm exploring different queries in ipython and wanting to inspect results.

When I'm using jit unspilling in my LocalCUDACluster, Dask-SQL's sql magic returns a proxy object:

In [6]: %%sql
select count(*) from date_dim             

Execution time: 0.35s
Out[6]: <dask_cuda.proxy_object.ProxyObject at 0x7fd13b3bcac0 of cudf.core.dataframe.DataFrame at 0x7fd0de38a9a0> 

It'd be nice for Dask-SQL to handle unproxying the result for users.

As a workaround, I can manually do:

>>> from dask_cuda.proxy_object import unproxy

In [6]: %%sql 
select count(*) from date_dim             
Execution time: 0.35s
Out[6]: <dask_cuda.proxy_object.ProxyObject at 0x7fd13b3bcac0 of cudf.core.dataframe.DataFrame at 0x7fd0de38a9a0> 

In [10]: unproxy(_)
Out[10]: 
   COUNT(UInt8(1))
0            73049

randerzander avatar Dec 08 '22 20:12 randerzander

I'm having some trouble reproducing this error. In Jupyter Lab, I have:

from dask_sql import Context
from dask_cuda import LocalCUDACluster
from distributed import Client
import os

cluster = LocalCUDACluster(n_workers=2, device_memory_limit="1GB", jit_unspill=True)
client = Client(cluster)

c = Context()

for table_name in os.listdir(f"/path/to/sf3000/parquet_2gb/"):
    c.create_table(
        table_name,
        f"/path/to/sf3000/parquet_2gb/{table_name}",
        format="parquet",
        gpu=True,
    )

c.ipython_magic()
%%sql
select count(*) from date_dim
  COUNT(UInt8(1))
0 73049

Can you provide some more context about your setup?

sarahyurick avatar Dec 16 '22 21:12 sarahyurick