airflow
airflow copied to clipboard
Failed to retrieve data from Oracle database with UTF-8 charset
Apache Airflow Provider(s)
oracle
Versions of Apache Airflow Providers
apache-airflow-providers-oracle==3.1.0
Apache Airflow version
2.3.2 (latest released)
Operating System
Linux 4.19.79-1.el7.x86_64
Deployment
Docker-Compose
Deployment details
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0
Python: 3.8
Oracle database charset: UTF-8 (returned by SELECT value FROM nls_database_parameters WHERE parameter = 'NLS_NCHAR_CHARACTERSET')
Oracle's client environment:
- LC_CTYPE=C.UTF-8
- NLS_LANG=AMERICAN_AMERICA.CL8MSWIN1251
- LC_ALL=C.UTF-8
What happened
Any query to Oracle database with UTF8 charset failed with error:
oracledb.exceptions.NotSupportedError: DPY-3012: national character set id 871 is not supported by python-oracledb in thin mode
What you think should happen instead
Definetelly, it should work, as it was in previous Oracle provider version (3.0.0).
Quick search shows that python-oracledb package, which replaces cx_Oracle in 3.1.0, uses thin driver mode by default, and it seems that UTF-8 codepage is not supported in that mode ( see this issue ). In order to get to thick mode, a call to oracledb.init_oracle_client() is required before any connection made ( see here ).
Indeed, if I add this call to airflow/providers/oracle/hooks/oracle.py, everything works fine. Resulting code looks like this:
import math
import warnings
from datetime import datetime
from typing import Dict, List, Optional, Union
import oracledb
oracledb.init_oracle_client()
...
Downgrade to version 3.0.0 also helps, but I suppose it should be some permanent solution, like adding a configuration parameter or so.
How to reproduce
- Setup an Oracle database with UTF8 charset
- Setup an Airflow connection with
oracletype - Create an operator which issues a
SELECTstatement against the database
Anything else
Task execution log as follows:
[2022-06-23, 17:35:36 MSK] {task_command.py:370} INFO - Running <TaskInstance: nip-stage-load2.load-dict.load-sa_user scheduled__2022-06-22T00:00:00+00:00 [running]> on host dwh_develop_scheduler [2022-06-23, 17:35:37 MSK] {taskinstance.py:1569} INFO - Exporting the following env vars: [email protected] AIRFLOW_CTX_DAG_OWNER=airflow AIRFLOW_CTX_DAG_ID=nip-stage-load2 AIRFLOW_CTX_TASK_ID=load-dict.load-sa_user AIRFLOW_CTX_EXECUTION_DATE=2022-06-22T00:00:00+00:00 AIRFLOW_CTX_TRY_NUMBER=1 AIRFLOW_CTX_DAG_RUN_ID=scheduled__2022-06-22T00:00:00+00:00 [2022-06-23, 17:35:37 MSK] {base.py:68} INFO - Using connection ID 'nip_standby' for task execution. [2022-06-23, 17:35:37 MSK] {base.py:68} INFO - Using connection ID 'stage' for task execution. [2022-06-23, 17:35:37 MSK] {data_transfer.py:198} INFO - Executing: SELECT * FROM GMP.SA_USER [2022-06-23, 17:35:37 MSK] {base.py:68} INFO - Using connection ID 'nip_standby' for task execution. [2022-06-23, 17:35:37 MSK] {taskinstance.py:1889} ERROR - Task failed with exception Traceback (most recent call last): File "/home/airflow/.local/lib/python3.8/site-packages/dwh_etl/operators/data_transfer.py", line 265, in execute if not self.no_check and self.compare_datasets(self.object_name, src, dest): File "/home/airflow/.local/lib/python3.8/site-packages/dwh_etl/operators/data_transfer.py", line 199, in compare_datasets src_df = src.get_pandas_df(sql) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/hooks/dbapi.py", line 128, in get_pandas_df with closing(self.get_conn()) as conn: File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/oracle/hooks/oracle.py", line 149, in get_conn conn = oracledb.connect(**conn_config) File "/home/airflow/.local/lib/python3.8/site-packages/oracledb/connection.py", line 1000, in connect return conn_class(dsn=dsn, pool=pool, params=params, **kwargs) File "/home/airflow/.local/lib/python3.8/site-packages/oracledb/connection.py", line 128, in init impl.connect(params_impl) File "src/oracledb/impl/thin/connection.pyx", line 345, in oracledb.thin_impl.ThinConnImpl.connect File "src/oracledb/impl/thin/connection.pyx", line 163, in oracledb.thin_impl.ThinConnImpl._connect_with_params File "src/oracledb/impl/thin/connection.pyx", line 129, in oracledb.thin_impl.ThinConnImpl._connect_with_description File "src/oracledb/impl/thin/connection.pyx", line 250, in oracledb.thin_impl.ThinConnImpl._connect_with_address File "src/oracledb/impl/thin/protocol.pyx", line 197, in oracledb.thin_impl.Protocol._connect_phase_two File "src/oracledb/impl/thin/protocol.pyx", line 263, in oracledb.thin_impl.Protocol._process_message File "src/oracledb/impl/thin/protocol.pyx", line 242, in oracledb.thin_impl.Protocol._process_message File "src/oracledb/impl/thin/messages.pyx", line 280, in oracledb.thin_impl.Message.process File "src/oracledb/impl/thin/messages.pyx", line 2094, in oracledb.thin_impl.ProtocolMessage._process_message File "/home/airflow/.local/lib/python3.8/site-packages/oracledb/errors.py", line 103, in _raise_err raise exc_type(_Error(message)) from cause oracledb.exceptions.NotSupportedError: DPY-3012: national character set id 871 is not supported by python-oracledb in thin mode
Are you willing to submit PR?
- [x] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Thanks for opening your first issue here! Be sure to follow the issue template!
There are two things you can do here: @skolchin (and I'd love if you could contribute it):
- Documentation on provider 3.1.0 should mention that only UTF-8 can be used
or (better)
- a "thick_mode" parameter could be added to initialization code and initialization
Since you have good testing environment for that - maybe you could add PR for hte latter? Shall I assign it to you @skolchin ?
I did it provisionally since you marked "willing to submit a PR" :)
@skolchin Were you able to work on this at all? I was testing using the newer oracle provider today and ran into an issue (see https://github.com/oracle/python-oracledb/issues/47) that can also be resolved by allowing for use of the thick client. I should have time to work on a PR tomorrow if you don't already have something ready to go.