dask-sql icon indicating copy to clipboard operation
dask-sql copied to clipboard

[QST] NotImplementedError: The python type string is not implemented (yet)

Open luzhengyang opened this issue 2 years ago • 5 comments

What is your question?

I keep getting this error when trying to query a table created from dask dataframe reading a csv file. A couple of columns in the csv file are strings. I've tried multiple ways to convert the pyarrow string type but none of them worked and the type remained unchanged. How should I proceed?

df = dd.read_csv("../sales.csv")
print(df.dtypes)

c = Context()
c.create_table("sales", df)
result = c.sql("SELECT * FROM sales").compute()
print(result)

/ArrowFlightService/lib/python3.9/site-packages/dask_sql/mappings.py", line 120, in python_to_sql_type raise NotImplementedError( NotImplementedError: The python type string is not implemented (yet)

luzhengyang avatar Sep 18 '23 18:09 luzhengyang

Thanks for raising the issue @luzhengyang. Could you also share the dask and dask-sql versions you're using in this example?

ayushdg avatar Sep 19 '23 08:09 ayushdg

My assumption here is that we're getting bitten by Dask's eager conversion of object columns to pyarrow strings, which we haven't be able to fully support yet (working on this in #1220); are you able to disable this eager conversion with dask.config.set({"dataframe.convert-string": False})? Would be interested in if that unblocks things here for you

charlesbluca avatar Sep 22 '23 15:09 charlesbluca

As discussed in Discourse, the basic documentation example reproduces this error, but disabling eager conversion fixes it.

import dask.datasets
df = dask.datasets.timeseries()
from dask_sql import Context

c = Context()
c.create_table("timeseries", df, persist=True)
result = c.sql("""
    SELECT
        name, SUM(x) AS "sum"
    FROM timeseries
    WHERE x > 0.5
    GROUP BY name
""")
result.compute()

guillaumeeb avatar Mar 27 '24 18:03 guillaumeeb

For now I've disabled eager string conversion in #1260 so that users aren't hit by this breakage by default

charlesbluca avatar Apr 17 '24 13:04 charlesbluca

can use with PY3.8.19 version, I encounter the above issues when using version 3.9 dask 2023.5.0 dask_sql 2023.11.0

hebian1994 avatar May 26 '24 09:05 hebian1994