pipelines icon indicating copy to clipboard operation
pipelines copied to clipboard

[sdk] KFP Client's `wait_for_completion` is not terminating properly.

Open seanswyi opened this issue 1 year ago • 2 comments

Environment

  • KFP version: 1.8.22
  • KFP SDK version: 1

Steps to reproduce

I’m currently using the KFP client to execute runs with different parameters as follows:

    run = client.run_pipeline(
        experiment_id=experiment_id,
        job_name=f"{model}-{data_version}",
        params=run_params,
        pipeline_id=pipeline_id,
        version_id=version_id,
    )

    run_id = run.id
    client.wait_for_run_completion(
        run_id=run_id,
        timeout=172800,  # 1 day = 86400 seconds.
    )

The problem is that when one run completes, I get the following error message:

Traceback (most recent call last):
  File "/home/user/pipeline/train-pipeline/run_all_models.py", line 135, in <module>
    main(args=args)
  File "/home/user/pipeline/train-pipeline/run_all_models.py", line 103, in main
    client.wait_for_run_completion(
  File "/home/user/.venv/lib/python3.10/site-packages/kfp/_client.py", line 1266, in wait_for_run_completion
    status = get_run_response.run.status
AttributeError: 'NoneType' object has no attribute 'status'

Why is this happening?

Expected result

The expected result is that it should move on to the next iteration of the loop.

seanswyi avatar Feb 06 '24 05:02 seanswyi

KFP version: 1.8.22 KFP SDK version: 1

@seanswyi can you please double check and confirm the versions for KFP runtime and KFP SDK?

chensun avatar Feb 15 '24 23:02 chensun

@chensun Sorry for the late reply. The versions are correct.

seanswyi avatar Feb 20 '24 00:02 seanswyi

Can you try upgrading to KFP 2.0.5 and SDK 2.7.0 and let us know if the issue remains?

rimolive avatar Mar 06 '24 18:03 rimolive

@rimolive Unfortunately I don't have access to the relevant resources anymore, so I don't think I'd be able to attempt that for the time being. It seems like a lot of these bugs have been fixed in v2 but I'm just speaking based off of assumptions here. I think we can close this for now and reopen it later if the problem persists?

seanswyi avatar Mar 07 '24 00:03 seanswyi

No worries! I'm closing the ticket but feel free to reopen if issue persists.

/close

rimolive avatar Mar 07 '24 09:03 rimolive

@rimolive: Closing this issue.

In response to this:

No worries! I'm closing the ticket but feel free to reopen if issue persists.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

google-oss-prow[bot] avatar Mar 07 '24 09:03 google-oss-prow[bot]