pipelines
pipelines copied to clipboard
[google-cloud-pipeline-components] Job Name in gcp_resource is not formatted correctly or is empty
Environment
-
KFP SDK version: 1.8.9
-
All dependencies version: google_cloud_pipepline_components version: 0.2.0
Steps to reproduce
Followed the tutorial of the Vertex AI pipeline and created a training job with the code below
training_op = gcc_aip.CustomContainerTrainingJobRunOp(
display_name = "test",
container_uri=container_uri,
project = project,
location = gcp_region,
dataset = dataset_create_op.outputs["dataset"],
staging_bucket = bucket,
training_fraction_split = 0.8,
validation_fraction_split = 0.1,
test_fraction_split = 0.1,
model_serving_container_image_uri = "custom container image uri",
model_serving_container_health_route="/healthcheck",
model_serving_container_predict_route="/predict",
model_display_name = "scikit-tests",
machine_type = "n1-standard-4",
)
After submitting the pipeline job, I received the error "Job Name in gcp_resource is not formatted correctly or is empty". By checking the source code, I suspect there is a bug in google_cloud_pipeline_components\container\experimental\gcp_launcher\job_remote_runner.py at line 80
The source code is
job_name_group = re.findall(
job_resources.resources[0].resource_uri,
f'{self.job_uri_prefix}(.*)')
To the best of my knowledge, the signature for re.findall is re.findall(pattern, string, flags=0) python doc, which indicates that the first parameter should be "pattern" while in the code snippet above it looks like you are using the "string" as the first parameter.
Could you please confirm if that is expected?
cc @IronPan
kfp -> 1.8.10 google-cloud-pipeline-components -> 0.2.1
I'm experiencing the same issue when running tasks created from google_cloud_pipeline_components.experimental.hyperparameter_tuning_job
And I located to the same place google_cloud_pipeline_components\container\experimental\gcp_launcher\job_remote_runner.py at line 80, where the inputs of pattern and string to re.finall are flipped.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hi there
Using ModelBatchPredictOp I'm experiencing the same issue:
python = 3.8 google-cloud = 0.34.0 google-cloud-aiplatform = 1.12.0 kfp = 1.8.12 google-cloud-pipeline-components = 1.0.5
prediction_task = ModelBatchPredictOp(
project=project,
location=location,
job_display_name=segment.job_name,
model=loaded_model_task.outputs["model"],
gcs_source_uris=segment.source,
instances_format="jsonl",
gcs_destination_output_uri_prefix=segment.results_prefix,
predictions_format="jsonl",
machine_type=machine_type,
starting_replica_count=segment.starting_replica_count,
max_replica_count=512,
)
What happens?
Traceback (most recent call last):
File "/opt/python3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main"
"__main__", mod_spec)"
File "/opt/python3.7/lib/python3.7/runpy.py", line 85, in _run_code"
exec(code, run_globals)"
File "/opt/python3.7/lib/python3.7/site-packages/google_cloud_pipeline_components/container/v1/gcp_launcher/launcher.py", line 229, in <module>"
main(sys.argv[1:])"
File "/opt/python3.7/lib/python3.7/site-packages/google_cloud_pipeline_components/container/v1/gcp_launcher/launcher.py", line 225, in main
_JOB_TYPE_TO_ACTION_MAP[job_type](**parsed_args)"
File "/opt/python3.7/lib/python3.7/site-packages/google_cloud_pipeline_components/container/v1/gcp_launcher/batch_prediction_job_remote_runner.py", line 105, in create_batch_prediction_job
job_name = remote_runner.check_if_job_exists()
File "/opt/python3.7/lib/python3.7/site-packages/google_cloud_pipeline_components/container/v1/gcp_launcher/job_remote_runner.py", line 104, in check_if_job_exists
'Job Name in gcp_resource is not formatted correctly or is empty.'
ValueError: Job Name in gcp_resource is not formatted correctly or is empty.
Any ideas or suggestions for dealing with this would be greatly appreciated.
Hi there Using
ModelBatchPredictOpI'm experiencing the same issue:python = 3.8 google-cloud = 0.34.0 google-cloud-aiplatform = 1.12.0 kfp = 1.8.12 google-cloud-pipeline-components = 1.0.5
prediction_task = ModelBatchPredictOp( project=project, location=location, job_display_name=segment.job_name, model=loaded_model_task.outputs["model"], gcs_source_uris=segment.source, instances_format="jsonl", gcs_destination_output_uri_prefix=segment.results_prefix, predictions_format="jsonl", machine_type=machine_type, starting_replica_count=segment.starting_replica_count, max_replica_count=512, )What happens?
Traceback (most recent call last): File "/opt/python3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main" "__main__", mod_spec)" File "/opt/python3.7/lib/python3.7/runpy.py", line 85, in _run_code" exec(code, run_globals)" File "/opt/python3.7/lib/python3.7/site-packages/google_cloud_pipeline_components/container/v1/gcp_launcher/launcher.py", line 229, in <module>" main(sys.argv[1:])" File "/opt/python3.7/lib/python3.7/site-packages/google_cloud_pipeline_components/container/v1/gcp_launcher/launcher.py", line 225, in main _JOB_TYPE_TO_ACTION_MAP[job_type](**parsed_args)" File "/opt/python3.7/lib/python3.7/site-packages/google_cloud_pipeline_components/container/v1/gcp_launcher/batch_prediction_job_remote_runner.py", line 105, in create_batch_prediction_job job_name = remote_runner.check_if_job_exists() File "/opt/python3.7/lib/python3.7/site-packages/google_cloud_pipeline_components/container/v1/gcp_launcher/job_remote_runner.py", line 104, in check_if_job_exists 'Job Name in gcp_resource is not formatted correctly or is empty.' ValueError: Job Name in gcp_resource is not formatted correctly or is empty.Any ideas or suggestions for dealing with this would be greatly appreciated.
I am currently having the same issue when trying to run a HyperparameterTuningJobRunOp. Did you find a solution?
I have exactly the same issue, when trying to run batch prediction over the uploaded vertex model: Job Name in gcp_resource is not formatted correctly or is empty.
Manual batch prediction fails with:
Batch prediction job batch_preduct encountered the following errors: Model server terminated: model server container terminated: exit_code: 1 reason: "Error" started_at { seconds: 1661869830 } finished_at { seconds: 1661869831 } .
The same model works fine when deployed to an endpoint. Any hints or workarounds would be greatly appreciated!
Update, according to that commit, the issue was fixed in the code 19 days ago. It doesn't look like it is scheduled for release any time soon.