dask-sql
dask-sql copied to clipboard
[BUG] [GPU Error Bug] "SELECT ((<string>)LIKE(<column>)) FROM <table>" brings Error
What happened:
"SELECT ((<string>)LIKE(<column>)) FROM <table>" brings error, when using GPU.
However it is able to output result, when using CPU.
What you expected to happen:
It will not bring error, when using GPU.
Minimal Complete Verifiable Example:
import pandas as pd
import dask.dataframe as dd
from dask_sql import Context
c = Context()
df1 = pd.DataFrame({
'c0': [0.6926717947094722],
'c1': ['B'],
})
t1 = dd.from_pandas(df1, npartitions=1)
c.create_table('t1', t1, gpu=False)
c.create_table('t1_gpu', t1, gpu=True)
print('CPU Result:')
result1= c.sql("SELECT (('A')LIKE(t1.c1)) FROM t1").compute()
print(result1)
print('GPU Result:')
result2= c.sql("SELECT (('A')LIKE(t1_gpu.c1)) FROM t1_gpu").compute()
print(result2)
Result:
INFO:numba.cuda.cudadrv.driver:init
CPU Result:
Utf8("A") LIKE t1.c1
0 False
GPU Result:
Traceback (most recent call last):
File "/tmp/bug.py", line 21, in <module>
result2= c.sql("SELECT (('A')LIKE(t1_gpu.c1)) FROM t1_gpu").compute()
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/context.py", line 513, in sql
return self._compute_table_from_rel(rel, return_futures)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/context.py", line 839, in _compute_table_from_rel
dc = RelConverter.convert(rel, context=self)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/physical/rel/convert.py", line 61, in convert
df = plugin_instance.convert(rel, context=context)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/physical/rel/logical/project.py", line 57, in convert
new_columns[random_name] = RexConverter.convert(
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/physical/rex/convert.py", line 74, in convert
df = plugin_instance.convert(rel, rex, dc, context=context)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/physical/rex/core/call.py", line 1129, in convert
return operation(*operands, **kwargs)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/physical/rex/core/call.py", line 77, in __call__
return self.f(*operands, **kwargs)
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask_sql/physical/rex/core/call.py", line 402, in regex
for char in regex:
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/dataframe/core.py", line 3997, in __iter__
yield from s
File "/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/utils/utils.py", line 288, in __iter__
raise TypeError(
TypeError: Series object is not iterable. Consider using `.to_arrow()`, `.to_pandas()` or `.values_host` if you wish to iterate over the values.
Anything else we need to know?:
Environment:
- dask-sql version: 2023.6.0
- Python version: Python 3.10.11
- Operating System: Ubuntu22.04
- Install method (conda, pip, source): Docker deploy by https://hub.docker.com/layers/rapidsai/rapidsai-dev/23.06-cuda11.8-devel-ubuntu22.04-py3.10/images/sha256-cfbb61fdf7227b090a435a2e758114f3f1c31872ed8dbd96e5e564bb5fd184a7?context=explore