airflow
airflow copied to clipboard
update pattern for dataflow job id extraction
Dataflow job id is extracted from the logged output of java
process that starts the Dataflow job, for example, in case of BeamRunJavaPipelineOperator
.
Currently job id pattern matches characters until first "
or \n
is encountered, which is fine for a following case:
- logged line:
[2024-08-27 11:20:22,094] INFO Submitted job: 2024-08-27_04_20_21-7947372725816706151
- extracted job id:
2024-08-27_04_20_21-7947372725816706151
However, if the logger is configured differently, for example, has a whitespace and a suffix at the end with additional information, the pattern extracts the id together with the suffix:
- logged line:
[2024-08-27 11:20:22,094] INFO Submitted job: 2024-08-27_04_20_21-7947372725816706151 (org.apache.beam.runners.dataflow.DataflowRunner) (main)
- extracted job id:
2024-08-27_04_20_21-7947372725816706151 (org.apache.beam.runners.dataflow.DataflowRunner) (main)
In the previous example suffix (org.apache.beam.runners.dataflow.DataflowRunner) (main)
should not be extracted as part of the job id.
I updated the pattern by adding the whitespace character \s
(along side existing "
and \n
), indicating the end of job id.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst
or {issue_number}.significant.rst
, in newsfragments.