pipelines icon indicating copy to clipboard operation
pipelines copied to clipboard

[google-cloud-pipeline-components] Job Name in gcp_resource is not formatted correctly or is empty

Open Qingwt opened this issue 3 years ago • 7 comments

Environment

  • KFP SDK version: 1.8.9

  • All dependencies version: google_cloud_pipepline_components version: 0.2.0

Steps to reproduce

Followed the tutorial of the Vertex AI pipeline and created a training job with the code below

    training_op = gcc_aip.CustomContainerTrainingJobRunOp(
        display_name = "test",
        container_uri=container_uri,                    
        project = project,
        location = gcp_region,
        dataset = dataset_create_op.outputs["dataset"],
        staging_bucket = bucket,
        training_fraction_split = 0.8,
        validation_fraction_split = 0.1,
        test_fraction_split = 0.1,
        model_serving_container_image_uri = "custom container image uri",
        model_serving_container_health_route="/healthcheck",
        model_serving_container_predict_route="/predict",
        model_display_name = "scikit-tests",
        machine_type = "n1-standard-4",
    )

After submitting the pipeline job, I received the error "Job Name in gcp_resource is not formatted correctly or is empty". By checking the source code, I suspect there is a bug in google_cloud_pipeline_components\container\experimental\gcp_launcher\job_remote_runner.py at line 80

The source code is

                job_name_group = re.findall(
                    job_resources.resources[0].resource_uri,
                    f'{self.job_uri_prefix}(.*)')

To the best of my knowledge, the signature for re.findall is re.findall(pattern, string, flags=0) python doc, which indicates that the first parameter should be "pattern" while in the code snippet above it looks like you are using the "string" as the first parameter.

Could you please confirm if that is expected?

Qingwt avatar Dec 06 '21 20:12 Qingwt

cc @IronPan

zijianjoy avatar Dec 10 '21 00:12 zijianjoy

kfp -> 1.8.10 google-cloud-pipeline-components -> 0.2.1

I'm experiencing the same issue when running tasks created from google_cloud_pipeline_components.experimental.hyperparameter_tuning_job And I located to the same place google_cloud_pipeline_components\container\experimental\gcp_launcher\job_remote_runner.py at line 80, where the inputs of pattern and string to re.finall are flipped.

wutianhao910 avatar Jan 07 '22 15:01 wutianhao910

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 17 '22 06:04 stale[bot]

Hi there Using ModelBatchPredictOp I'm experiencing the same issue:

python = 3.8 google-cloud = 0.34.0 google-cloud-aiplatform = 1.12.0 kfp = 1.8.12 google-cloud-pipeline-components = 1.0.5

prediction_task = ModelBatchPredictOp(
    project=project,
    location=location,
    job_display_name=segment.job_name,
    model=loaded_model_task.outputs["model"],
    gcs_source_uris=segment.source,
    instances_format="jsonl",
    gcs_destination_output_uri_prefix=segment.results_prefix,
    predictions_format="jsonl",
    machine_type=machine_type,
    starting_replica_count=segment.starting_replica_count,
    max_replica_count=512,
)

What happens?

Traceback (most recent call last):
  File "/opt/python3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main"
    "__main__", mod_spec)"
  File "/opt/python3.7/lib/python3.7/runpy.py", line 85, in _run_code"
    exec(code, run_globals)"
  File "/opt/python3.7/lib/python3.7/site-packages/google_cloud_pipeline_components/container/v1/gcp_launcher/launcher.py", line 229, in <module>"
    main(sys.argv[1:])"
  File "/opt/python3.7/lib/python3.7/site-packages/google_cloud_pipeline_components/container/v1/gcp_launcher/launcher.py", line 225, in main
    _JOB_TYPE_TO_ACTION_MAP[job_type](**parsed_args)"
  File "/opt/python3.7/lib/python3.7/site-packages/google_cloud_pipeline_components/container/v1/gcp_launcher/batch_prediction_job_remote_runner.py", line 105, in create_batch_prediction_job
    job_name = remote_runner.check_if_job_exists()
  File "/opt/python3.7/lib/python3.7/site-packages/google_cloud_pipeline_components/container/v1/gcp_launcher/job_remote_runner.py", line 104, in check_if_job_exists
    'Job Name in gcp_resource is not formatted correctly or is empty.'
ValueError: Job Name in gcp_resource is not formatted correctly or is empty.

Any ideas or suggestions for dealing with this would be greatly appreciated.

andjedani avatar May 31 '22 11:05 andjedani

Hi there Using ModelBatchPredictOp I'm experiencing the same issue:

python = 3.8 google-cloud = 0.34.0 google-cloud-aiplatform = 1.12.0 kfp = 1.8.12 google-cloud-pipeline-components = 1.0.5

prediction_task = ModelBatchPredictOp(
    project=project,
    location=location,
    job_display_name=segment.job_name,
    model=loaded_model_task.outputs["model"],
    gcs_source_uris=segment.source,
    instances_format="jsonl",
    gcs_destination_output_uri_prefix=segment.results_prefix,
    predictions_format="jsonl",
    machine_type=machine_type,
    starting_replica_count=segment.starting_replica_count,
    max_replica_count=512,
)

What happens?

Traceback (most recent call last):
  File "/opt/python3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main"
    "__main__", mod_spec)"
  File "/opt/python3.7/lib/python3.7/runpy.py", line 85, in _run_code"
    exec(code, run_globals)"
  File "/opt/python3.7/lib/python3.7/site-packages/google_cloud_pipeline_components/container/v1/gcp_launcher/launcher.py", line 229, in <module>"
    main(sys.argv[1:])"
  File "/opt/python3.7/lib/python3.7/site-packages/google_cloud_pipeline_components/container/v1/gcp_launcher/launcher.py", line 225, in main
    _JOB_TYPE_TO_ACTION_MAP[job_type](**parsed_args)"
  File "/opt/python3.7/lib/python3.7/site-packages/google_cloud_pipeline_components/container/v1/gcp_launcher/batch_prediction_job_remote_runner.py", line 105, in create_batch_prediction_job
    job_name = remote_runner.check_if_job_exists()
  File "/opt/python3.7/lib/python3.7/site-packages/google_cloud_pipeline_components/container/v1/gcp_launcher/job_remote_runner.py", line 104, in check_if_job_exists
    'Job Name in gcp_resource is not formatted correctly or is empty.'
ValueError: Job Name in gcp_resource is not formatted correctly or is empty.

Any ideas or suggestions for dealing with this would be greatly appreciated.

I am currently having the same issue when trying to run a HyperparameterTuningJobRunOp. Did you find a solution?

Houyon avatar Jul 28 '22 09:07 Houyon

I have exactly the same issue, when trying to run batch prediction over the uploaded vertex model: Job Name in gcp_resource is not formatted correctly or is empty.

Manual batch prediction fails with:

Batch prediction job batch_preduct encountered the following errors: Model server terminated: model server container terminated: exit_code: 1 reason: "Error" started_at { seconds: 1661869830 } finished_at { seconds: 1661869831 } .

The same model works fine when deployed to an endpoint. Any hints or workarounds would be greatly appreciated!

abdullin avatar Aug 31 '22 13:08 abdullin

Update, according to that commit, the issue was fixed in the code 19 days ago. It doesn't look like it is scheduled for release any time soon.

abdullin avatar Aug 31 '22 13:08 abdullin