dbt-oracle icon indicating copy to clipboard operation
dbt-oracle copied to clipboard

[Bug] Oracle run_result.json has rows_affected = 0 always when data are inserted or updated into target table

Open cindyzh-ibm opened this issue 1 year ago • 14 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

run_results.json has rows_affected = 0 always even when there are data are inserted into target table or updated in the target table.

Screenshot 2024-02-22 at 11 11 56 AM

{
  "metadata": {
    "dbt_schema_version": "https://schemas.getdbt.com/dbt/run-results/v4.json",
    "dbt_version": "1.4.99",
    "generated_at": "2024-02-21T17:54:33.596062Z",
    "invocation_id": "5ca55e12-7cc4-43be-8ef2-754c4fb1fc0c",
    "env": {
      
    }
  },
  "results": [
    {
      "status": "success",
      "timing": [
        {
          "name": "compile",
          "started_at": "2024-02-21T17:54:31.748147Z",
          "completed_at": "2024-02-21T17:54:31.765072Z"
        },
        {
          "name": "execute",
          "started_at": "2024-02-21T17:54:31.801492Z",
          "completed_at": "2024-02-21T17:54:33.039715Z"
        }
      ],
      "thread_id": "Thread-2 (worker)",
      "execution_time": 1.3536255359649658,
      "adapter_response": {
        "_message": "OK",
        "rows_affected": 0
      },
      "message": "OK",
      "failures": null,
      "unique_id": "model.DBT_RUN_MATRIX_TEST_2_0.RUN_MATRIX_TGT_3"
    },
    {
      "status": "success",
      "timing": [
        {
          "name": "compile",
          "started_at": "2024-02-21T17:54:31.742269Z",
          "completed_at": "2024-02-21T17:54:31.763914Z"
        },
        {
          "name": "execute",
          "started_at": "2024-02-21T17:54:31.771594Z",
          "completed_at": "2024-02-21T17:54:33.061006Z"
        }
      ],
      "thread_id": "Thread-1 (worker)",
      "execution_time": 1.3784840106964111,
      "adapter_response": {
        "_message": "OK",
        "rows_affected": 0
      },
      "message": "OK",
      "failures": null,
      "unique_id": "model.DBT_RUN_MATRIX_TEST_2_0.RUN_MATRIX_TGT_1"
    }
  ],
  "elapsed_time": 4.044718503952026,
  "args": {
    
  }
}

Expected Behavior

Expected run_results.json have

  • code to indicate insert, update, or delete
  • corresponding rows_affected reflects the number of rows are either deleted, inserted, or updated.

Steps To Reproduce

Create a flow with Oracle as source and target, Run the flow to get the run_results.json

Relevant log output using --debug flag enabled

No response

Environment

- OS:
- Python:
- dbt:1.4.9

What Oracle database version are you using dbt with?

No response

Additional Context

No response

cindyzh-ibm avatar Feb 22 '24 19:02 cindyzh-ibm

@cindyzh-ibm

Which python driver are you using ?

is it cx_Oracle or the new oracledb python driver.

dbt-oracle uses the python driver's cursor.rowcount attribute to populate rows_affected in run_results.json.

Please switch to the new driver if you have not done. To switch to the thin driver, you can set the following environment variable before running dbt

export ORA_PYTHON_DRIVER_TYPE=thin

This should populate rows_affected correctly.

aosingh avatar Feb 22 '24 21:02 aosingh

Thanks, @aosingh , Let me check and get back to you

cindyzh-ibm avatar Feb 22 '24 23:02 cindyzh-ibm

Hi @aosingh we are using oracledb python driver. The default is cx_oracle which is deprecating, so we are using oracledb and we do using thin mode. Following is the piece from our log.

15 2/21/2024 09:59:32 INFO IIS-DSEE-TODT-00002 <dbt,0> 17:59:26  oracle adapter: Running in thin mode
16 2/21/2024 09:59:32 INFO IIS-DSEE-TODT-00002 <dbt,0> 17:59:26  Running with dbt=1.4.99 

This is our command

dbt
-command 'ORA_PYTHON_DRIVER_TYPE="thin" dbt run --vars \'{....}\' --profiles-dir "..."
;

cindyzh-ibm avatar Feb 23 '24 00:02 cindyzh-ibm

Thank you @cindyzh-ibm.

What is the version of oracledb driver ?

python3 -c "import oracledb; print(oracledb.__version__)"

Also, please share the output of command

dbt --version

I have been trying to reproduce the problem with dbt-oracle==1.4.4 which is the latest version published in 1.4.x release line. Are you using incremental models ?

I tried a few simple cases which seem to work fine.

  1. dbt seed - Insertion works as expected
"results": [
        {
            "status": "success",
            "timing": [
                {
                    "name": "compile",
                    "started_at": "2024-02-23T01:26:39.822399Z",
                    "completed_at": "2024-02-23T01:26:39.822408Z"
                },
                {
                    "name": "execute",
                    "started_at": "2024-02-23T01:26:39.824222Z",
                    "completed_at": "2024-02-23T01:26:40.628122Z"
                }
            ],
            "thread_id": "Thread-1",
            "execution_time": 0.8198621273040771,
            "adapter_response": {
                "_message": "INSERT 5",
                "code": "INSERT",
                "rows_affected": 5
            },
            "message": "INSERT 5",
            "failures": null,
            "unique_id": "seed.dbt_adbs_test_project.seed"
        }
    ],
  1. Table materialization - Creation of Table also shows the rows affected
    "results": [
        {
            "status": "success",
            "timing": [
                {
                    "name": "compile",
                    "started_at": "2024-02-23T01:28:30.549139Z",
                    "completed_at": "2024-02-23T01:28:30.555564Z"
                },
                {
                    "name": "execute",
                    "started_at": "2024-02-23T01:28:30.556883Z",
                    "completed_at": "2024-02-23T01:28:31.258269Z"
                }
            ],
            "thread_id": "Thread-1",
            "execution_time": 0.730828046798706,
            "adapter_response": {
                "_message": "OK",
                "rows_affected": 5
            },
            "message": "OK",
            "failures": null,
            "unique_id": "model.dbt_adbs_test_project.people"
        }
    ],

aosingh avatar Feb 23 '24 01:02 aosingh

cc: @anthony-tuininga

aosingh avatar Feb 23 '24 01:02 aosingh

@aosingh

sh-4.4$ python3 -c "import oracledb; print(oracledb.__version__)"
1.4.1
sh-4.4$ dbt --version
03:06:30  oracle adapter: Running in cx mode
03:06:30  oracle adapter: [WARNING]: cx_oracle is no longer maintained, use python-oracledb

To switch to python-oracledb set the environment variable ORA_PYTHON_DRIVER_TYPE=thin 

Read the guideline here: https://docs.getdbt.com/reference/warehouse-setups/oracle-setup#configure-the-python-driver-mode

Documentation for python-oracledb can be found here: https://oracle.github.io/python-oracledb/
Core:
  - installed: 1.4.99
  - latest:    1.7.8  - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - bigquery:  1.4.1  - Update available!
  - oracle:    1.4.8  - Update available!
  - postgres:  1.4.99 - Update available!
  - redshift:  1.4.1  - Update available!
  - snowflake: 1.4.5  - Update available!
  - teradata:  1.4.99 - Update available!

  At least one plugin is out of date or incompatible with dbt-core.
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

cindyzh-ibm avatar Feb 23 '24 03:02 cindyzh-ibm

For your question of "Are you using incremental models ?". No

{{
  config(
    materialized='datastage'
    , table_action='append'
    ....
  )
}}

cindyzh-ibm avatar Feb 23 '24 16:02 cindyzh-ibm

@cindyzh-ibm ismaterialized='datastage' a custom materialization which you have implemented ? dbt doesn't implement "datastage" materialization.

Could you verify "target" relation is returned in the end ?

-- Return the relations created in this materialization
  {{ return({'relations': [target_relation]}) }}

aosingh avatar Feb 23 '24 18:02 aosingh

@aosingh Yes, we use custom materialization.

No, we don't have "target" relation is returned in the end. We prepare statement and run run_query macro which delegates execution to dbt core by means of specific adapter

{{
  config(
    materialized='datastage'
    
    , write_mode='update'
    , key_columns=['ID']
    
    , force_commit='false'
    
    , use_adapter_drop_statement='false'
  )
}}
{{
  config(
    materialized='datastage'
    
    , table_action='append'
    , write_mode='insert'
    
    , force_commit='false'
    
    , use_adapter_drop_statement='false'
  )
}}

cindyzh-ibm avatar Feb 23 '24 19:02 cindyzh-ibm

@cindyzh-ibm what is this statement ? is it an anonymous PL/SQL block or CTAS ? Details will help me reproduce the problem.

aosingh avatar Feb 23 '24 22:02 aosingh

@aosingh custom materialization prepares and submits sql statements

cindyzh-ibm avatar Feb 24 '24 00:02 cindyzh-ibm

@cindyzh-ibm

I understand these are SQL statements. What is the last statement executed in your custom materialization?

dbt-oracle uses cursor.rowcount to get the number of rows affected

The documentation states that cursor.rowcount is a read-only attribute which specifies the number of rows that have currently been fetched from the cursor (for select statements) or that have been affected by the operation (for insert, update, delete and merge statements). For all other statements the value is always zero.

Immediately after the cursor.execute() call has been made, if no rows have been fetched, the result will be 0. If you call cursor.fetchone() then result will be 1 and if you call cursor.fetchmany(5) then the result will be 6, and so forth

Did you try returning the target relation in the end ?

-- Return the relations created in this materialization
  {{ return({'relations': [target_relation]}) }}

aosingh avatar Feb 26 '24 18:02 aosingh

@aosingh ok. This is insert statement

BEGIN
  EXECUTE IMMEDIATE '
 insert into "test"."RUN_MATRIX_TGT_1" ("ID", "NAME", "HIRE_DATE", "SALARY")
    select "ID", "NAME", "HIRE_DATE", "SALARY"
    from  "test"."test__dbt_tmp"
';
END;

Update statement

BEGIN
  EXECUTE IMMEDIATE '
  update "test"."RUN_MATRIX_TGT_3"
  set
  ("ID", "NAME", "HIRE_DATE", "SALARY")
  = ( select "ID", "NAME", "HIRE_DATE", "SALARY"
    from "test"."test__dbt_tmp"
    where 
                    "test"."test__dbt_tmp"."ID" = "test"."test"."ID"
                )
';
END;

I haven't tried following yet. So far we don't have "target" relation is returned in the end.

-- Return the relations created in this materialization
  {{ return({'relations': [target_relation]}) }}

cindyzh-ibm avatar Feb 26 '24 19:02 cindyzh-ibm

@cindyzh-ibm

Thanks, you are executing a PL/SQL(BEGIN..END) block in the custom materialization.

The value for cursor.rowcount will be 0 after executing a PL/SQL block. Let me check if we can use sql%rowcount expression immediately to get the row affected and pass that to dbt.

aosingh avatar Feb 26 '24 21:02 aosingh