clickhouse-odbc
clickhouse-odbc copied to clipboard
Issues with Null Values - Clickhouse ODBC Driver for Windows
I have been experiencing some issues when trying to query data that contains null values via the Clickhouse ODBC Driver for Windows. I have tested reading and writing from both Ubuntu and Windows. Reading from Windows always fails and reading from Ubuntu always succeeds, regardless of which platform was used to write.
For the sake of space, I won't include the utility I wrote to read and write to CH, but I can provide details as needed. It uses the following format to write:
cmd = f'cat {path} | clickhouse-client -h {server} --query="INSERT INTO {table} FORMAT Parquet"'
and the following format to read:
with pyodbc.connect(connection_str, autocommit=True) as con:
con.setencoding("utf-8")
df = pd.read_sql(qry, con)
return df
where the connection uses the Clickhouse ODBC Driver for Windows.
I use the datatype Nullable(Float64) for all the columns in my table.
Expected Behavior
Create example, write, and read via Ubuntu:

Actual Behavior
Using the same example from above, except we are using the Windows ODBC Driver:

Error:
---------------------------------------------------------------------------
Error Traceback (most recent call last)
<ipython-input-5-483e3b2a980f> in <module>
----> 1 windowsOutput = ch.query_df("SELECT * FROM testODBC")
H:\BOND_TRA\ATJ\Quant\Projects\DMA\clickhouse.py in query_df(qry)
127 with pyodbc.connect(connection_str, autocommit=True) as con:
128 con.setencoding("utf-8")
--> 129 df = pd.read_sql(qry, con)
130 return df
131
~\AppData\Local\Continuum\anaconda3\envs\DMA_CH\lib\site-packages\pandas\io\sql.py in read_sql(sql, con, index_col, coerce_float, params, parse_dates, columns, chunksize)
378 sql, index_col=index_col, params=params,
379 coerce_float=coerce_float, parse_dates=parse_dates,
--> 380 chunksize=chunksize)
381
382 try:
~\AppData\Local\Continuum\anaconda3\envs\DMA_CH\lib\site-packages\pandas\io\sql.py in read_query(self, sql, index_col, coerce_float, params, parse_dates, chunksize)
1475 parse_dates=parse_dates)
1476 else:
-> 1477 data = self._fetchall_as_list(cursor)
1478 cursor.close()
1479
~\AppData\Local\Continuum\anaconda3\envs\DMA_CH\lib\site-packages\pandas\io\sql.py in _fetchall_as_list(self, cur)
1484
1485 def _fetchall_as_list(self, cur):
-> 1486 result = cur.fetchall()
1487 if not isinstance(result, list):
1488 result = list(result)
Error: ('HY000', '[HY000] Syntax error: Not a valid floating-point number: á´ºáµ\x81ᴸᴸ (1) (SQLGetData)')
To ensure this isn't an issue related to writing via Window, here we use windows to read the table written by Linux:

Further, linux can read the data written by windows:

Now that is issue is specific to Windows reading, we can see that the issue stems from the Nan...
No Nan in query results:

Nan included in query results

Summary
It appears that the Windows ODBC Driver is not able to handle Nan values in query result
Thanks in advance and please let me know if I can provide additional information.
Can you please provide some minimal example of python code using some standard libs, so we can check/reproduce that? It it not very handy to reconstruct the code from screenshots and guess what those function df_to_ch, query_df etc. do and where do they came from.
You can just upload some snippet, and fragments of H:\BOND_TRA\ATJ\Quant\Projects\DMA\clickhouse.py to gist. Thank you.
Meanwhile, I tried with ODBCTest tool (in Windows) and isql tool (in Linux), and both, null's and nan's seem to work with either of those tools as expected in the recent code. I was trying with changes from this PR https://github.com/ClickHouse/clickhouse-odbc/pull/211 applied to the master branch locally, but the nature of the involved code suggests that the latest release (v1.1.1.20191108) should work in this case too.
Table created, populated, and verified as:
$ clickhouse-client --query "CREATE TABLE nullable_nan_col_issue_tbl ( nullable_nan_col Nullable(Float64) ) ENGINE = Memory"
$ clickhouse-client --query "INSERT INTO nullable_nan_col_issue_tbl VALUES ( NULL ), ( NAN ), (1.23456789)"
$ clickhouse-client --query "SELECT * FROM nullable_nan_col_issue_tbl"
\N
nan
1.2345678900000001
ODBCTool data retrieval result:
Full Connect(Default)
Env. Attr. SQL_ATTR_ODBC_VERSION set to SQL_OV_ODBC3
Successfully connected to DSN 'ClickHouse_DSNW_Debug'.
SQLExecDirect:
In: Statementhandle = 0x0000000002064290, StatementText = "SELECT * FROM nullable_nan_col_issue_tbl", Statementlength = 40
Return: SQL_SUCCESS=0
Get Data All:
"nullable_nan_col"
<Null>
nan
1.2345678900000001
3 rows fetched from 1 column.
isql data retrieval result:
$ isql ClickHouse_DSNW_Debug
+---------------------------------------+
| Connected! |
| |
| sql-statement |
| help [tablename] |
| quit |
| |
+---------------------------------------+
SQL> SELECT * FROM nullable_nan_col_issue_tbl
+-------------------+
| nullable_nan_col |
+-------------------+
| |
| nan |
| 1.2345678900000001|
+-------------------+
SQLRowCount returns 3
3 rows fetched
SQL>