beam icon indicating copy to clipboard operation
beam copied to clipboard

Force show dataflow job URL!

Open Ox0400 opened this issue 7 months ago • 8 comments

Fix : RuntimeError: No dataflow job was found when running the python file.

Google_cloud_pipeline_components/container/v1/dataflow/dataflow_python_job_remote_runner.py create_python_job match url required!

https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/container/v1/dataflow/dataflow_python_job_remote_runner.py#L106

Please add a meaningful description for your change here image

When using on GCP, Google_cloud_pipeline_components match job URL use re required !!

from google_cloud_pipeline_components.v1.dataflow import DataflowPythonJobOp
...


@dsl.pipeline(name='test dataflow')
def test_dataflow():
    dataflow_task = DataflowPythonJobOp(
        project=project,
        location=region,
        python_module_path=dataflow_clean_local_path,
        requirements_file_path=requirements_file_path,
        temp_location=temp_location,
        args=[
            "--project", project,
            "--region", region,
            "--temp_location", temp_location,
            "--job_name", f"dataflow-clean-{time.strftime('%Y%m%d-%H%M%S', time.gmtime())}",
            "--save_main_session",
            "--runner", "DataflowRunner",
        ],
    )

...


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • [ ] Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • [ ] Update CHANGES.md with noteworthy changes.
  • [ ] If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels Python tests Java tests Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

Ox0400 avatar May 15 '25 04:05 Ox0400

Assigning reviewers:

R: @shunping for label python.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

github-actions[bot] avatar May 15 '25 05:05 github-actions[bot]

Logging the job id as a warning message looks weird. It is more straightforward for the users to set the logging level in their pipeline code.

https://github.com/apache/beam/issues/35013#issuecomment-2895550883

shunping avatar May 20 '25 19:05 shunping

cc @liferoad

shunping avatar May 30 '25 15:05 shunping

I am fine with this as a workaround. I think my comment https://github.com/apache/beam/issues/35013#issuecomment-2916338539 is still a right way to fix this.

liferoad avatar May 30 '25 17:05 liferoad

This could add additional logging that some users may find annoying, I'm fine to print one line, but maybe not all of the logs from apitools.

When using on GCP, Google_cloud_pipeline_components match job URL use re required !!

Why not extract job id as follows:

  from apache_beam.runners.dataflow.dataflow_runner import DataflowPipelineResult
  p = beam.Pipeline(argv=pipeline_args)
  p | <...>
  result = p.run()
  if isinstance(result, DataflowPipelineResult):
    print(result.job_id())
  result.wait_until_finish()

tvalentyn avatar Jun 02 '25 16:06 tvalentyn

#35013 (comment)

https://github.com/apache/beam/issues/35013#issuecomment-2908695423

liferoad avatar Jun 02 '25 16:06 liferoad

Reminder, please take a look at this pr: @shunping

github-actions[bot] avatar Jun 10 '25 12:06 github-actions[bot]

waiting on author

shunping avatar Jun 10 '25 18:06 shunping

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Aug 10 '25 12:08 github-actions[bot]

waiting on author

derrickaw avatar Aug 12 '25 15:08 derrickaw

I apologize for the delay in responding. After careful consideration, I'm not inclined to fix this bug by adding extensive new code or features. Since we haven't reached a consensus, I'll be closing this pull request along with the related issue.

Ox0400 avatar Aug 27 '25 03:08 Ox0400