databricks-sql-python
databricks-sql-python copied to clipboard
Unknown Error - Getting from Databricks SQL Python - From PyArrow module (pyarrow.lib.ArrowException)
Versions: Python: 3.11.9 Databricks SQL Connector: 3.3.0 pandas: 2.1.4 PyArrow: 16.1.0
Code: Below is the code I'm using to fetch data from databricks which has some utf-8 encoded char (�) in the data.
sql_query = f""" SELECT Column FROM table LIMIT 10 """
host = os.getenv("DATABRICKS_HOST")
http_path = os.getenv("DATABRICKS_HTTP_PATH")
connection = sql.connect(server_hostname=host, http_path=http_path)
with conn.cursor() as cursor:
cursor.execute(sql_query)
response = fetchmany(cursor)
Error:
response = fetchmany(cursor)
^^^^^^^^^^^^^^^^^
File "/usr/local/airflow/.local/lib/python3.11/site-packages/my_agent/app_helper/db_helper.py", line 35, in fetchmany
query_results = cursor.fetchmany(fetch_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/airflow/.local/lib/python3.11/site-packages/my_agent_connectors/dq_databricks/dependencies/databricks/sql/client.py", line 976, in fetchmany
return self.active_result_set.fetchmany(size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/airflow/.local/lib/python3.11/site-packages/my_agent_connectors/dq_databricks/dependencies/databricks/sql/client.py", line 1235, in fetchmany
return self._convert_arrow_table(self.fetchmany_arrow(size))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/airflow/.local/lib/python3.11/site-packages/my_agent_connectors/dq_databricks/dependencies/databricks/sql/client.py", line 1161, in _convert_arrow_table
df = table_renamed.to_pandas(
^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/array.pxi", line 884, in pyarrow.lib._PandasConvertible.to_pandas
File "pyarrow/table.pxi", line 4192, in pyarrow.lib.Table._to_pandas
File "/usr/local/airflow/.local/lib/python3.11/site-packages/pyarrow/pandas_compat.py", line 776, in table_to_dataframe
blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/airflow/.local/lib/python3.11/site-packages/pyarrow/pandas_compat.py", line 1131, in _table_to_blocks
return [_reconstruct_block(item, columns, extension_columns)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/airflow/.local/lib/python3.11/site-packages/pyarrow/pandas_compat.py", line 1131, in <listcomp>
return [_reconstruct_block(item, columns, extension_columns)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/airflow/.local/lib/python3.11/site-packages/pyarrow/pandas_compat.py", line 736, in _reconstruct_block
pd_ext_arr = pandas_dtype.__from_arrow__(arr)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/airflow/.local/lib/python3.11/site-packages/pandas/core/arrays/string_.py", line 230, in __from_arrow__
arr = arr.to_numpy(zero_copy_only=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/array.pxi", line 1581, in pyarrow.lib.Array.to_numpy
File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
pyarrow.lib.ArrowException: Unknown error: Wrapping Ruta 204 � Zapote failed```
Hi @Gobi2511 , Can you use the fetchAll from cursor instead? Or do you have any limitations to do this?
with connection.cursor() as cursor:
cursor.execute(sql_query)
response = cursor.fetchmany_arrow(10)
print(response)
OR
with connection.cursor() as cursor:
cursor.execute(sql_query)
response = cursor.fetchmany(10)
@samikshya-db Even with fetchall I'm getting the same error. It is failing when it is reading the data which contains � (utf-8 encoded chars). It must be on the pyarrow dependency.