dask-sql
dask-sql copied to clipboard
Generalize dtype check in `python_to_sql_type`
Chatting with @DaceT exposed that python_to_sql_type
only has handling for pandas / numpy types, which can cause issues when attempting to read in a table with a cuDF dtype, like so:
import cudf
from dask_sql import Context
c = Context()
cdf = cudf.datasets.timeseries()
c.create_table("tble", cdf)
result = c.sql("""SELECT timestamp FROM cdf""")
result.compute()
TypeError: unhashable type: 'CategoricalDtype'
A pretty simple workaround for this is generalizing the column dtype check so that instead of checking for a np.dtype
, we check if the Python type is "dtype-like".
Interested in if there's a trivial util function like is_dataframe_like
for dtypes?
Codecov Report
Merging #402 (33a1801) into main (00df420) will increase coverage by
0.14%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## main #402 +/- ##
==========================================
+ Coverage 88.94% 89.09% +0.14%
==========================================
Files 68 68
Lines 3337 3337
Branches 657 657
==========================================
+ Hits 2968 2973 +5
+ Misses 297 288 -9
- Partials 72 76 +4
Impacted Files | Coverage Δ | |
---|---|---|
dask_sql/mappings.py | 100.00% <100.00%> (ø) |
|
dask_sql/_version.py | 34.00% <0.00%> (+1.44%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 00df420...33a1801. Read the comment docs.
Things are a little more complicated than I initially assumed here - opened #423 to track this issue