airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Failed to retrieve data from Oracle database with UTF-8 charset

Open skolchin opened this issue 3 years ago • 4 comments

Apache Airflow Provider(s)

oracle

Versions of Apache Airflow Providers

apache-airflow-providers-oracle==3.1.0

Apache Airflow version

2.3.2 (latest released)

Operating System

Linux 4.19.79-1.el7.x86_64

Deployment

Docker-Compose

Deployment details

Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 Python: 3.8 Oracle database charset: UTF-8 (returned by SELECT value FROM nls_database_parameters WHERE parameter = 'NLS_NCHAR_CHARACTERSET') Oracle's client environment:

  • LC_CTYPE=C.UTF-8
  • NLS_LANG=AMERICAN_AMERICA.CL8MSWIN1251
  • LC_ALL=C.UTF-8

What happened

Any query to Oracle database with UTF8 charset failed with error:

oracledb.exceptions.NotSupportedError: DPY-3012: national character set id 871 is not supported by python-oracledb in thin mode

What you think should happen instead

Definetelly, it should work, as it was in previous Oracle provider version (3.0.0).

Quick search shows that python-oracledb package, which replaces cx_Oracle in 3.1.0, uses thin driver mode by default, and it seems that UTF-8 codepage is not supported in that mode ( see this issue ). In order to get to thick mode, a call to oracledb.init_oracle_client() is required before any connection made ( see here ).

Indeed, if I add this call to airflow/providers/oracle/hooks/oracle.py, everything works fine. Resulting code looks like this:

import math
import warnings
from datetime import datetime
from typing import Dict, List, Optional, Union

import oracledb
oracledb.init_oracle_client()
...

Downgrade to version 3.0.0 also helps, but I suppose it should be some permanent solution, like adding a configuration parameter or so.

How to reproduce

  • Setup an Oracle database with UTF8 charset
  • Setup an Airflow connection with oracle type
  • Create an operator which issues a SELECT statement against the database

Anything else

Task execution log as follows:

[2022-06-23, 17:35:36 MSK] {task_command.py:370} INFO - Running <TaskInstance: nip-stage-load2.load-dict.load-sa_user scheduled__2022-06-22T00:00:00+00:00 [running]> on host dwh_develop_scheduler [2022-06-23, 17:35:37 MSK] {taskinstance.py:1569} INFO - Exporting the following env vars: [email protected] AIRFLOW_CTX_DAG_OWNER=airflow AIRFLOW_CTX_DAG_ID=nip-stage-load2 AIRFLOW_CTX_TASK_ID=load-dict.load-sa_user AIRFLOW_CTX_EXECUTION_DATE=2022-06-22T00:00:00+00:00 AIRFLOW_CTX_TRY_NUMBER=1 AIRFLOW_CTX_DAG_RUN_ID=scheduled__2022-06-22T00:00:00+00:00 [2022-06-23, 17:35:37 MSK] {base.py:68} INFO - Using connection ID 'nip_standby' for task execution. [2022-06-23, 17:35:37 MSK] {base.py:68} INFO - Using connection ID 'stage' for task execution. [2022-06-23, 17:35:37 MSK] {data_transfer.py:198} INFO - Executing: SELECT * FROM GMP.SA_USER [2022-06-23, 17:35:37 MSK] {base.py:68} INFO - Using connection ID 'nip_standby' for task execution. [2022-06-23, 17:35:37 MSK] {taskinstance.py:1889} ERROR - Task failed with exception Traceback (most recent call last): File "/home/airflow/.local/lib/python3.8/site-packages/dwh_etl/operators/data_transfer.py", line 265, in execute if not self.no_check and self.compare_datasets(self.object_name, src, dest): File "/home/airflow/.local/lib/python3.8/site-packages/dwh_etl/operators/data_transfer.py", line 199, in compare_datasets src_df = src.get_pandas_df(sql) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/hooks/dbapi.py", line 128, in get_pandas_df with closing(self.get_conn()) as conn: File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/oracle/hooks/oracle.py", line 149, in get_conn conn = oracledb.connect(**conn_config) File "/home/airflow/.local/lib/python3.8/site-packages/oracledb/connection.py", line 1000, in connect return conn_class(dsn=dsn, pool=pool, params=params, **kwargs) File "/home/airflow/.local/lib/python3.8/site-packages/oracledb/connection.py", line 128, in init impl.connect(params_impl) File "src/oracledb/impl/thin/connection.pyx", line 345, in oracledb.thin_impl.ThinConnImpl.connect File "src/oracledb/impl/thin/connection.pyx", line 163, in oracledb.thin_impl.ThinConnImpl._connect_with_params File "src/oracledb/impl/thin/connection.pyx", line 129, in oracledb.thin_impl.ThinConnImpl._connect_with_description File "src/oracledb/impl/thin/connection.pyx", line 250, in oracledb.thin_impl.ThinConnImpl._connect_with_address File "src/oracledb/impl/thin/protocol.pyx", line 197, in oracledb.thin_impl.Protocol._connect_phase_two File "src/oracledb/impl/thin/protocol.pyx", line 263, in oracledb.thin_impl.Protocol._process_message File "src/oracledb/impl/thin/protocol.pyx", line 242, in oracledb.thin_impl.Protocol._process_message File "src/oracledb/impl/thin/messages.pyx", line 280, in oracledb.thin_impl.Message.process File "src/oracledb/impl/thin/messages.pyx", line 2094, in oracledb.thin_impl.ProtocolMessage._process_message File "/home/airflow/.local/lib/python3.8/site-packages/oracledb/errors.py", line 103, in _raise_err raise exc_type(_Error(message)) from cause oracledb.exceptions.NotSupportedError: DPY-3012: national character set id 871 is not supported by python-oracledb in thin mode

Are you willing to submit PR?

  • [x] Yes I am willing to submit a PR!

Code of Conduct

skolchin avatar Jun 23 '22 14:06 skolchin

Thanks for opening your first issue here! Be sure to follow the issue template!

boring-cyborg[bot] avatar Jun 23 '22 14:06 boring-cyborg[bot]

There are two things you can do here: @skolchin (and I'd love if you could contribute it):

  • Documentation on provider 3.1.0 should mention that only UTF-8 can be used

or (better)

  • a "thick_mode" parameter could be added to initialization code and initialization

Since you have good testing environment for that - maybe you could add PR for hte latter? Shall I assign it to you @skolchin ?

potiuk avatar Jul 03 '22 12:07 potiuk

I did it provisionally since you marked "willing to submit a PR" :)

potiuk avatar Jul 03 '22 12:07 potiuk

@skolchin Were you able to work on this at all? I was testing using the newer oracle provider today and ran into an issue (see https://github.com/oracle/python-oracledb/issues/47) that can also be resolved by allowing for use of the thick client. I should have time to work on a PR tomorrow if you don't already have something ready to go.

pauldalewilliams avatar Sep 21 '22 02:09 pauldalewilliams