clickhouse-odbc icon indicating copy to clipboard operation
clickhouse-odbc copied to clipboard

Issues with Null Values - Clickhouse ODBC Driver for Windows

Open ajennings67 opened this issue 6 years ago • 2 comments

I have been experiencing some issues when trying to query data that contains null values via the Clickhouse ODBC Driver for Windows. I have tested reading and writing from both Ubuntu and Windows. Reading from Windows always fails and reading from Ubuntu always succeeds, regardless of which platform was used to write.

For the sake of space, I won't include the utility I wrote to read and write to CH, but I can provide details as needed. It uses the following format to write:

cmd = f'cat {path} | clickhouse-client -h {server} --query="INSERT INTO {table} FORMAT Parquet"'

and the following format to read:

    with pyodbc.connect(connection_str, autocommit=True) as con:
        con.setencoding("utf-8")
        df = pd.read_sql(qry, con)
    return df

where the connection uses the Clickhouse ODBC Driver for Windows.

I use the datatype Nullable(Float64) for all the columns in my table.

Expected Behavior

Create example, write, and read via Ubuntu: image

Actual Behavior

Using the same example from above, except we are using the Windows ODBC Driver: image

Error:

---------------------------------------------------------------------------
Error                                     Traceback (most recent call last)
<ipython-input-5-483e3b2a980f> in <module>
----> 1 windowsOutput = ch.query_df("SELECT * FROM testODBC")

H:\BOND_TRA\ATJ\Quant\Projects\DMA\clickhouse.py in query_df(qry)
    127     with pyodbc.connect(connection_str, autocommit=True) as con:
    128         con.setencoding("utf-8")
--> 129         df = pd.read_sql(qry, con)
    130     return df
    131 

~\AppData\Local\Continuum\anaconda3\envs\DMA_CH\lib\site-packages\pandas\io\sql.py in read_sql(sql, con, index_col, coerce_float, params, parse_dates, columns, chunksize)
    378             sql, index_col=index_col, params=params,
    379             coerce_float=coerce_float, parse_dates=parse_dates,
--> 380             chunksize=chunksize)
    381 
    382     try:

~\AppData\Local\Continuum\anaconda3\envs\DMA_CH\lib\site-packages\pandas\io\sql.py in read_query(self, sql, index_col, coerce_float, params, parse_dates, chunksize)
   1475                                         parse_dates=parse_dates)
   1476         else:
-> 1477             data = self._fetchall_as_list(cursor)
   1478             cursor.close()
   1479 

~\AppData\Local\Continuum\anaconda3\envs\DMA_CH\lib\site-packages\pandas\io\sql.py in _fetchall_as_list(self, cur)
   1484 
   1485     def _fetchall_as_list(self, cur):
-> 1486         result = cur.fetchall()
   1487         if not isinstance(result, list):
   1488             result = list(result)

Error: ('HY000', '[HY000] Syntax error: Not a valid floating-point number: á´ºáµ\x81ᴸᴸ (1) (SQLGetData)')

To ensure this isn't an issue related to writing via Window, here we use windows to read the table written by Linux: image

Further, linux can read the data written by windows: image

Now that is issue is specific to Windows reading, we can see that the issue stems from the Nan...

No Nan in query results: image

Nan included in query results image

Summary

It appears that the Windows ODBC Driver is not able to handle Nan values in query result

Thanks in advance and please let me know if I can provide additional information.

ajennings67 avatar Aug 26 '19 14:08 ajennings67

Can you please provide some minimal example of python code using some standard libs, so we can check/reproduce that? It it not very handy to reconstruct the code from screenshots and guess what those function df_to_ch, query_df etc. do and where do they came from.

You can just upload some snippet, and fragments of H:\BOND_TRA\ATJ\Quant\Projects\DMA\clickhouse.py to gist. Thank you.

filimonov avatar Nov 12 '19 20:11 filimonov

Meanwhile, I tried with ODBCTest tool (in Windows) and isql tool (in Linux), and both, null's and nan's seem to work with either of those tools as expected in the recent code. I was trying with changes from this PR https://github.com/ClickHouse/clickhouse-odbc/pull/211 applied to the master branch locally, but the nature of the involved code suggests that the latest release (v1.1.1.20191108) should work in this case too.

Table created, populated, and verified as:

$ clickhouse-client --query "CREATE TABLE nullable_nan_col_issue_tbl ( nullable_nan_col Nullable(Float64) ) ENGINE = Memory"
$ clickhouse-client --query "INSERT INTO nullable_nan_col_issue_tbl VALUES ( NULL ), ( NAN ), (1.23456789)"
$ clickhouse-client --query "SELECT * FROM nullable_nan_col_issue_tbl"
\N
nan
1.2345678900000001

ODBCTool data retrieval result:

Full Connect(Default)

	Env. Attr. SQL_ATTR_ODBC_VERSION set to SQL_OV_ODBC3

	Successfully connected to DSN 'ClickHouse_DSNW_Debug'.
SQLExecDirect:
				In:				Statementhandle = 0x0000000002064290, StatementText = "SELECT * FROM nullable_nan_col_issue_tbl", Statementlength = 40
				Return:	SQL_SUCCESS=0

Get Data All:
"nullable_nan_col"
<Null>
nan
1.2345678900000001
3 rows fetched from 1 column.

isql data retrieval result:

$ isql ClickHouse_DSNW_Debug
+---------------------------------------+
| Connected!                            |
|                                       |
| sql-statement                         |
| help [tablename]                      |
| quit                                  |
|                                       |
+---------------------------------------+
SQL> SELECT * FROM nullable_nan_col_issue_tbl
+-------------------+
| nullable_nan_col  |
+-------------------+
|                   |
| nan               |
| 1.2345678900000001|
+-------------------+
SQLRowCount returns 3
3 rows fetched
SQL> 

traceon avatar Nov 13 '19 08:11 traceon