airflow
airflow copied to clipboard
Retrieve AWS Cloudwatch logs using aws_conn_id connection for GlueOperator
Apache Airflow version
2.4.0
What happened
We are using cross account AWS connection to invoke a Glue job using GlueOperator. When the verbose flag is set we expected the cloudwatch logs to be retrieved from the AWS account where glue job was executed and displayed in Airflow logs.
We constantly receive below error on Airflow logs rather than getting cloudwatch logs
Polling for AWS Glue Job <Job name> current run state with status RUNNING No new Glue driver logs found. This might be because there are no new logs, or might be an error. If the error persists, check the CloudWatch dashboard at: https://ap-southeast-1.console.aws.amazon.com/cloudwatch/home
Underlying issue Below code suggest that it creates a new connection to retrieve logs on same account as the invoker of glue job.
airflow/glue.py at cc4f245758340f5fc278bbbdc958a40b85f39bb8 · apache/airflow
But as we are executing the glue job from a different account; the code needs updates to share the same connection in retrieving the logs.
What you think should happen instead
When AWS GlueOperator is called with a aws_conn_id parameter and verbose flag we expect
- Glue job to be executed in the aws account as per aws_conn_id
- Cloudwatch logs to be relieved from aws account as per aws_conn_id and displayed in Airflow logs
Proposed Solution
- Dont create a new boto3 client for logs (https://github.com/apache/airflow/blob/cc4f245758340f5fc278bbbdc958a40b85f39bb8/airflow/providers/amazon/aws/hooks/glue.py#L155)
- Create a client using a derived method of base AwsBaseHook classes method get_client_type (https://github.com/apache/airflow/blob/4bf0cb98724a2cf04aab6359881a87aeb9cec0ce/airflow/providers/amazon/aws/hooks/base_aws.py#L438)
- Update glue Hooks print_job_logs method to use boto3 get_log_events function rather than filter_log_events function (https://github.com/apache/airflow/blob/4bf0cb98724a2cf04aab6359881a87aeb9cec0ce/airflow/providers/amazon/aws/hooks/glue.py#L165). This has been identified as another problem where the filter operation is unsuitable for retrieving verbose logs. When filter_log_events is used with no filter the output doesn't have events.
How to reproduce
- submit a glue job in a different AWS account using GlueOperator as below
with DAG( dag_id="sample_dag", description="Sample DAG testing", schedule_interval=None, start_date=datetime(2022, 9, 1), catchup=False, ) as dag: submit_glue_job = SeekGlueJobOperator( aws_conn_id="different_aws_account", task_id="submit_glue_job", job_name=job_name, wait_for_completion=True, retry_limit=1, script_location=None, iam_role_name="iam-role-for-glue-job-invocation", script_args={}, run_job_kwargs={"NumberOfWorkers": 2, 'WorkerType': "Standard"}, verbose=True, region_name="ap-southeast-1" ) - Check Airflow logs to see if cloudwatch logs are retrieved and displayed
Operating System
Mac 12.6
Versions of Apache Airflow Providers
No response
Deployment
Docker-Compose
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Thanks for opening your first issue here! Be sure to follow the issue template!
@ChirangaL, I've made some suggestion to current existed PR: https://github.com/apache/airflow/pull/26269#discussion_r969486988 If I get you correct it should help in your case. It also possible to make this changes with individual PR (quite simple)
Hi @Taragolis You are correct. The updates in mentioned PR will resolve our issue. But given the below proposed update is performed. https://github.com/apache/airflow/pull/26269/files/3e4e6db6a2e199483022f27dc4681dfae293b2ed#r969486988
I linked the issue to the PR.
Hi @Taragolis You are correct. The updates in mentioned PR will resolve our issue. But given the below proposed update is performed. https://github.com/apache/airflow/pull/26269/files/3e4e6db6a2e199483022f27dc4681dfae293b2ed#r969486988
The PR has been updated, the access key and secret are now fetched from the hook credentials. @ChirangaL please have a look at the current state of the PR to see if it should suffice for your usecase?
@o-nikolas - given there is no response for more than a month, can we close this issue for now? @ChirangaL - feel free to open it again if the solution doesn't work for your use case.
@o-nikolas - given there is no response for more than a month, can we close this issue for now? @ChirangaL - feel free to open it again if the solution doesn't work for your use case.
Agreed, I will close the issue