dask-sql
dask-sql copied to clipboard
[BUG][GPU Logic Bug] "SELECT <column> FROM <table>" brings Error
What happened:
Using "SELECT <column> FROM <table>" by JDBC and python brings 4 different results, when using CPU and GPU.
What you expected to happen:
It is the same result, when using CPU and GPU.
Minimal Complete Verifiable Example:
Query by JDBC:
DROP SCHEMA IF EXISTS database0;
CREATE SCHEMA IF NOT EXISTS database0;
USE SCHEMA database0;
CREATE TABLE t1 WITH ( location = '/tmp/t1.csv', format = 'csv', gpu = FALSE );
CREATE TABLE t1_gpu WITH ( location = '/tmp/t1.csv', format = 'csv', gpu = TRUE );
t1.csv:
c0,c1,c2,c3
'', True, CAST((-127) AS TINYINT), 'Q,,p 4 v'
SQL:
SELECT t1.c2 FROM t1;
Result:
c2
------
NULL
(1 row)
SQL:
SELECT t1_gpu.c2 FROM t1_gpu;
Result:
c2
--------------------------
CAST((-127) AS TINYINT)
(1 row)
Query by python:
import pandas as pd
import dask.dataframe as dd
from dask_sql import Context
c = Context()
t1 = dd.read_csv('/tmp/t1.csv')
c.create_table('t1', t1, gpu=False)
c.create_table('t1_gpu', t1, gpu=True)
print('CPU Result:')
result1= c.sql("SELECT t1.c2 FROM t1").compute()
print(result1)
print('GPU Result:')
result2= c.sql("SELECT t1_gpu.c2 FROM t1_gpu").compute()
print(result2)
Result:
INFO:numba.cuda.cudadrv.driver:init
CPU Result:
c2
'' True NaN
GPU Result:
c2
'' True <NA>
Anything else we need to know?:
Environment:
- dask-sql version: 2023.6.0
- Python version: Python 3.10.11
- Operating System: Ubuntu22.04
- Install method (conda, pip, source): Docker deploy by https://hub.docker.com/layers/rapidsai/rapidsai-dev/23.06-cuda11.8-devel-ubuntu22.04-py3.10/images/sha256-cfbb61fdf7227b090a435a2e758114f3f1c31872ed8dbd96e5e564bb5fd184a7?context=explore