connector-x icon indicating copy to clipboard operation
connector-x copied to clipboard

Clickhouse fetching boolean type support error

Open kuatroka opened this issue 3 years ago • 13 comments

What language are you using?

Python.

What version are you using?

3.9

What database are you using?

Clickhouse

What dataframe are you using?

Pandas

Can you describe your bug?

I'm trying to access the clickhouse db, but I can't connect to it. As per the API doc on the read_sql() function I use SQLAlchemy connection string SQLALCHEMY_CLICKHOUSE_URL = 'clickhouse://default:default@localhost:8123/default' but it doesn't connect. I changed the clickhouse part to mysql, but it doesn't work either. I manually added ports 8123 to the string, but no luck. I add protocol="text", but still nothing.

To test the connection in general, I used a postgres connection string SQLALCHEMY_POSTGRES_URL = "postgresql://postgres:postgres@localhost:5432/public" and it worked.

I also tried to change the localhost for an explicitly stated IP, but to no avail.

If possible, please include a minimal simple example including:

Database setup if the error only happens on specific data or data type

The IDE on Windows 10 and clickhouse is installed on the same Windows machine but on the WSL2 part

Example query / code
SQLALCHEMY_CLICKHOUSE_URL =  'mysql://default:default@localhost/default'
query = """SELECT COUNT (DISTINCT path) FROM staging"""

df = cx.read_sql(SQLALCHEMY_CLICKHOUSE_URL, query)

What is the error?

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
c:\Users\playground\sql_playground.ipynb Cell 58' in <cell line: 1>()
----> [1](vscode-notebook-cell:/c%3A/Users/playground/sql_playground.ipynb#ch0000056?line=0) df = cx.read_sql(SQLALCHEMY_MYSQL_URL, query)

File c:\Users\venv\lib\site-packages\connectorx\__init__.py:224, in read_sql(conn, query, return_type, protocol, partition_on, partition_range, partition_num, index_col)
    221 except ModuleNotFoundError:
    222     raise ValueError("You need to install pandas first")
--> 224 result = _read_sql(
    225     conn,
    226     "pandas",
    227     queries=queries,
    228     protocol=protocol,
    229     partition_query=partition_query,
    230 )
    231 df = reconstruct_pandas(result)
    233 if index_col is not None:

RuntimeError: timed out waiting for connection

I do have pandas installed.

Thanks

kuatroka avatar Jun 23 '22 22:06 kuatroka

Hi @kuatroka , which version of connectorx are you using? If it is not the newest version (0.3.1-alpha.1), can you upgrade and see whether it works?

wangxiaoying avatar Jun 25 '22 22:06 wangxiaoying

Sure, I'll update and try it again. Which syntax for the connection do you suggest though? Referencing clickhouse or mysql?

kuatroka avatar Jun 26 '22 01:06 kuatroka

Hi @kuatroka , cx.read_sql("clickhouse://...", query) is the same with cx.read_sql("mysql://...", query, protocol="text"). Both of them should work.

wangxiaoying avatar Jun 27 '22 05:06 wangxiaoying

The same error... If I specify the port SQLALCHEMY_CLICKHOUSE_URL = 'clickhouse://default:[email protected]:8123/default' I get RuntimeError: timed out waiting for connection

If I don't specify the port SQLALCHEMY_CLICKHOUSE_URL = 'clickhouse://default:default@localhost/default' I get RuntimeError: timed out waiting for connection: DriverError { Could not connect to address 'localhost:3306': No connection could be made because the target machine actively refused it. (os error 10061) }

kuatroka avatar Jun 27 '22 13:06 kuatroka

Hi @kuatroka , it seems like the error indicates that the database cannot be reached (at least through the mysql protocol in connectorx). Can you try to use a mysql client tool to test whether it can be connected?

wangxiaoying avatar Jul 04 '22 04:07 wangxiaoying

Hi, I can't connect through the MySQL Workbench . I get this error message. (I have also tried with the port 9000 and no luck) image

but I have no issues using DBeaver tool when connecting to it or Arctype image Maybe I need to somehow specify that I need to use the JDBC way as it seems to be working perfectly.

image

Thanks

kuatroka avatar Jul 04 '22 11:07 kuatroka

Hi @kuatroka , we only support connecting to clickhouse with mysql protocol. I think it may be enabled by clickhouse through setting the mysql_port like mentioned here.

wangxiaoying avatar Jul 05 '22 01:07 wangxiaoying

Hi @wangxiaoying Yes, my installation has this port enabled too. I used the 9004 to connect with the standard python module import mysql.connector but the connectorx still gives me errors. The error messages changed though.

For the standard default user and when using the port 9004 I now get this RuntimeError: timed out waiting for connection: DriverError { Unknown authentication protocol: `sha256_password` }

I have created a new user pep in clickhouse and gives me this error RuntimeError: MySqlError { ERROR 48 (00000): Code: 48. DB::Exception: Command [ERRFMT] is not implemented. (NOT_IMPLEMENTED) (version 22.7.1.84 (official build)) }

It's obvious that with the new user I get the connection as it reaches the server version 22.7.1.84, but something else breaks Thanks

kuatroka avatar Jul 06 '22 15:07 kuatroka

Hi @kuatroka , Looks like the second error is similar to this. have you set the protocl=text or use clickhouse:// instead of mysql:// when getting this error: RuntimeError: MySqlError { ERROR 48 (00000): Code: 48. DB::Exception: Command [ERRFMT] is not implemented. (NOT_IMPLEMENTED) (version 22.7.1.84 (official build)) } ?

wangxiaoying avatar Jul 07 '22 23:07 wangxiaoying

Hi @wangxiaoying. Here go various error and the corresponding settings 1. connection_string = 'clickhouse://pep:pep@localhost:9004/default' Error PanicException: Could not retrieve core::option::Option<u8> from Value

2. connection_string = 'mysql://pep:pep@localhost:9004/default' and no protocol='text' Error RuntimeError: MySqlError { ERROR 48 (00000): Code: 48. DB::Exception: Command [ERRFMT] is not implemented. (NOT_IMPLEMENTED) (version 22.7.1.84 (official build)) }

3. connection_string = 'mysql://pep:pep@localhost:9004/default' plus protocol='text' Error PanicException: Could not retrieve core::option::Option<u8> from Value

kuatroka avatar Jul 08 '22 14:07 kuatroka

Hi @kuatroka , it looks like 1 and 3 can successfully connect to the database, but it throws error when retrieving the data Could not retrieve core::option::Option<u8> from Value. Can you narrow down which type of data that cause the failure?

wangxiaoying avatar Jul 13 '22 23:07 wangxiaoying

Hi @wangxiaoying. Thanks for the idea. I went through the 22 columns of the table and it's a boolean column that gives me the error below PanicException: Could not retrieve core::option::Option<u8> from Value

All the other columns are ok with no errors.

kuatroka avatar Jul 17 '22 18:07 kuatroka

It's definitely a problem with Boolean values. I cast the Boolean column into Integer and connector-x brought the data in with no errors The modified query for that one columns is this """SELECT column1::UInt16 FROM table LIMIT 5"""

kuatroka avatar Jul 21 '22 14:07 kuatroka