airflow icon indicating copy to clipboard operation
airflow copied to clipboard

RedshiftDataOperator fails when `return_sql_result` is true, and SQL statements are provided

Open thesuperzapper opened this issue 1 year ago • 1 comments

Apache Airflow Provider(s)

amazon

Versions of Apache Airflow Providers

Affects all current versions of apache-airflow-providers-amazon, including 8.24.0

Apache Airflow version

N/A - all versions

Operating System

NA

Deployment

Other

Deployment details

No response

What happened

There is a bug in RedshiftDataOperator if multiple sql queries are passed, and return_sql_result is set to true, then you will get the following error:

An error occurred (ValidationException) when calling the GetStatementResult operation: BatchExecuteStatement result can only be retrieved with sub-statement id.

What you think should happen instead

No response

How to reproduce

Run a RedshiftDataOperator like this:


from airflow.providers.amazon.aws.operators.redshift_data import RedshiftDataOperator
run_query = RedshiftDataOperator(
    task_id="run_query",
    aws_conn_id="MY_AWS_CONNECTION",
    #
    # Redshift parameters
    cluster_identifier="MY_REDSHIFT_CLUSTER",
    region="MY_REGION",
    database="MY_DB",
    sql=[
        "SELECT 1;",
        "SELECT 2;",
        "SELECT 3;",
    ],
    #
    # Redshift Data API parameters
    return_sql_result=True,
    statement_name="SOME_NAME",
    secret_arn=(
        "arn:aws:secretsmanager"
        ":MY_REGION:MY_ACCOUNT"
        ":secret:MY_DATA_API_CREDENTIAL_SECRET"
    ),
    #
    # Trigger parameters
    wait_for_completion=True,
    poll_interval=10,
)

Anything else

To fix this we will need to record the length of sql list, and loop through get_statement_result for each sub-statement id because you can only get one statement result at a time.

For example, if there are three, you would get the results of the following queries, starting from :1:

  • d9b6c0c9-0747-4bf4-b142-e8883122f766:1
  • d9b6c0c9-0747-4bf4-b142-e8883122f766:2
  • d9b6c0c9-0747-4bf4-b142-e8883122f766:3

There are TWO difference places where we call get_statement_result that need to be updated:

  1. For non-deferred mode in execute():
    • https://github.com/apache/airflow/blob/providers-amazon/8.24.0/airflow/providers/amazon/aws/operators/redshift_data.py#L165
  2. For deferred mode in execute_complete()
    • https://github.com/apache/airflow/blob/providers-amazon/8.24.0/airflow/providers/amazon/aws/operators/redshift_data.py#L186
    • NOTE: we will need to have the deferred operator bubble the number of statements in its yield TriggerEvent( response
      • https://github.com/apache/airflow/blob/providers-amazon/8.24.0/airflow/providers/amazon/aws/triggers/redshift_data.py#L99-L101

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

thesuperzapper avatar Jun 26 '24 00:06 thesuperzapper

Hello @thesuperzapper,

I have made some changes to the RedshiftDataOperator to address the issue with handling multiple SQL statements. Please review my pull request https://github.com/apache/airflow/pull/40443 and let me know if my approach is correct.

isatyamks avatar Jun 26 '24 17:06 isatyamks

Was this issue resolved? If not, I'd love to have it assigned to myself. Thanks!

jroachgolf84 avatar Jul 14 '24 02:07 jroachgolf84

@jroachgolf84 As of now, this issue has not been resolved. Please feel free to take it on. Thanks!

isatyamks avatar Jul 15 '24 10:07 isatyamks

@jroachgolf84 As of now, this issue has not been resolved. Please feel free to take it on. Thanks!

Thank you! Do you mind adding me as the assignee?

jroachgolf84 avatar Jul 15 '24 13:07 jroachgolf84

assigned

potiuk avatar Jul 15 '24 14:07 potiuk

@potiuk can you unassign @isatyamks as no progress has been made.

If anyone else wants to work on this, please comment.

thesuperzapper avatar Aug 15 '24 23:08 thesuperzapper

@potiuk can you unassign @isatyamks as no progress has been made.

If anyone else wants to work on this, please comment.

I've got the skeleton of a pull request started, apologies for the delay. I'm continuing to work on this.

jroachgolf84 avatar Aug 15 '24 23:08 jroachgolf84

This issue can be closed, PR: https://github.com/apache/airflow/pull/42900 addressed this issue and was merged.

jroachgolf84 avatar Jan 18 '25 14:01 jroachgolf84