dask-sql
dask-sql copied to clipboard
[ENH] Handle unproxying dask-cuda proxy_object returns from sqlmagic
I'm exploring different queries in ipython and wanting to inspect results.
When I'm using jit unspilling in my LocalCUDACluster, Dask-SQL's sql magic returns a proxy object:
In [6]: %%sql
select count(*) from date_dim
Execution time: 0.35s
Out[6]: <dask_cuda.proxy_object.ProxyObject at 0x7fd13b3bcac0 of cudf.core.dataframe.DataFrame at 0x7fd0de38a9a0>
It'd be nice for Dask-SQL to handle unproxying the result for users.
As a workaround, I can manually do:
>>> from dask_cuda.proxy_object import unproxy
In [6]: %%sql
select count(*) from date_dim
Execution time: 0.35s
Out[6]: <dask_cuda.proxy_object.ProxyObject at 0x7fd13b3bcac0 of cudf.core.dataframe.DataFrame at 0x7fd0de38a9a0>
In [10]: unproxy(_)
Out[10]:
COUNT(UInt8(1))
0 73049
I'm having some trouble reproducing this error. In Jupyter Lab, I have:
from dask_sql import Context
from dask_cuda import LocalCUDACluster
from distributed import Client
import os
cluster = LocalCUDACluster(n_workers=2, device_memory_limit="1GB", jit_unspill=True)
client = Client(cluster)
c = Context()
for table_name in os.listdir(f"/path/to/sf3000/parquet_2gb/"):
c.create_table(
table_name,
f"/path/to/sf3000/parquet_2gb/{table_name}",
format="parquet",
gpu=True,
)
c.ipython_magic()
%%sql
select count(*) from date_dim
| COUNT(UInt8(1)) | |
|---|---|
| 0 | 73049 |
Can you provide some more context about your setup?