pipelines
pipelines copied to clipboard
[backend] Can't run a data-passed pipeline many times with the same pipeline function name. Error: This step is in Error state with this message: Error (exit code 1): key unsupported: cannot get key for artifact location, because it is invalid
Environment
- Kubeflow Pipelines Standalone on a local cluster.
- KFP version: 1.7.0
- KFP SDK version: 1.8.2
Steps to reproduce
Dear developers, I try to run the same pipeline many times with different arguments, which are line
and count
in the following official data_pass
example.
In the beginning, the main pipeline function is named pipeline_func
. And the pipeline runs successfully the 1st time, but for the 2nd time, the error happens, which is quite the same as this closed issue https://github.com/kubeflow/pipelines/issues/5948, also, has a connection with this issue https://github.com/argoproj/argo-workflows/issues/6497.
I change the main pipeline function name, trying to figure out the problem. Below is the process.
# (1) change the pipeline function name, success
# pipeline_func → a-new-pipeline-func-please
a-new-pipeline-func-please-s8tb5-2208969941 # launched pipeline pods
a-new-pipeline-func-please-s8tb5-2839087706
# (2) use the new name, run again, error
a-new-pipeline-func-please-snp4h-1210744786
# (3) change the pipeline function name again, success
# a-new-pipeline-func-please, success → a-new-pipeline-func-please-again
a-new-pipeline-func-please-again-wfq8v-351450082
a-new-pipeline-func-please-again-wfq8v-766454711
# (4) use the new name, run again, success
a-new-pipeline-func-please-again-pcbzd-2732903275
a-new-pipeline-func-please-again-pcbzd-98665068
# (5) use the new name, run again, error
a-new-pipeline-func-please-again-kg4w4-25440416
Here is the official data_pass
example.
from kfp.components import InputPath, OutputPath, func_to_container_op
import kfp
kfp_client = kfp.Client()
@func_to_container_op
def repeat_line(line: str, count: int, output_text_path: OutputPath(str)):
with open(output_text_path, 'w') as writer:
for i in range(count):
writer.write(line + '\n')
@func_to_container_op
def print_text(input_text_path: InputPath()):
with open(input_text_path, 'r') as reader:
for line in reader:
print(line, end='')
def a_new_pipeline_func_please_again(line: str = 'Hello', count: int = 10):
repeat_line_task = repeat_line(line=line, count=count)
print_text(repeat_line_task.output)
if __name__ == '__main__':
kfp_client.create_run_from_pipeline_func(
a_new_pipeline_func_please_again,
arguments={
'line': "bbb",
'count': 5,
},
)
Expected result
About the artifact repository. I find some clues in the kubeflow namespace
$ kubectl get cm -n kubeflow
NAME DATA AGE
inferenceservice-config 9 87d
istio-ca-root-cert 1 160d
kfp-launcher 1 150d
kfserving-config 1 87d
kfserving-models-web-app-config-mtgm8bbd98 1 87d
metadata-grpc-configmap 2 150d
ml-pipeline-ui-configmap 1 150d
pipeline-install-config 15 150d
workflow-controller-configmap 3 150d
$ kubectl get cm workflow-controller-configmap -n kubeflow -o yaml
apiVersion: v1
data:
artifactRepository: | # the artifactRepository configuration.
archiveLogs: true
s3:
endpoint: "minio-service.kubeflow:9000"
bucket: "mlpipeline"
# keyFormat is a format pattern to define how artifacts will be organized in a bucket.
# It can reference workflow metadata variables such as workflow.namespace, workflow.name,
# pod.name. Can also use strftime formating of workflow.creationTimestamp so that workflow
# artifacts can be organized by date. If omitted, will use `{{workflow.name}}/{{pod.name}}`,
# which has potential for have collisions, because names do not guarantee they are unique
# over the lifetime of the cluster.
# Refer to https://kubernetes.io/docs/concepts/overview/working-with-objects/names/.
#
# The following format looks like:
# artifacts/my-workflow-abc123/2018/08/23/my-workflow-abc123-1234567890
# Adding date into the path greatly reduces the chance of {{pod.name}} collision.
keyFormat: "artifacts/{{workflow.name}}/{{workflow.creationTimestamp.Y}}/{{workflow.creationTimestamp.m}}/{{workflow.creationTimestamp.d}}/{{pod.name}}"
# insecure will disable TLS. Primarily used for minio installs not configured with TLS
insecure: true
accessKeySecret:
name: mlpipeline-minio-artifact
key: accesskey
secretKeySecret:
name: mlpipeline-minio-artifact
key: secretkey
containerRuntimeExecutor: docker
executor: |
imagePullPolicy: IfNotPresent
resources:
requests:
cpu: 0.01
memory: 32Mi
limits:
cpu: 0.5
memory: 512Mi
kind: ConfigMap
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","data":{"artifactRepository":"archiveLogs: true\ns3:\n endpoint: \"minio-service.kubeflow:9000\"\n bucket: \"mlpipeline\"\n # keyFormat is a format pattern to define how artifacts will be organized in a bucket.\n # It can reference workflow metadata variables such as workflow.namespace, workflow.name,\n # pod.name. Can also use strftime formating of workflow.creationTimestamp so that workflow\n # artifacts can be organized by date. If omitted, will use `{{workflow.name}}/{{pod.name}}`,\n # which has potential for have collisions, because names do not guarantee they are unique\n # over the lifetime of the cluster.\n # Refer to https://kubernetes.io/docs/concepts/overview/working-with-objects/names/.\n #\n # The following format looks like:\n # artifacts/my-workflow-abc123/2018/08/23/my-workflow-abc123-1234567890\n # Adding date into the path greatly reduces the chance of {{pod.name}} collision.\n keyFormat: \"artifacts/{{workflow.name}}/{{workflow.creationTimestamp.Y}}/{{workflow.creationTimestamp.m}}/{{workflow.creationTimestamp.d}}/{{pod.name}}\"\n # insecure will disable TLS. Primarily used for minio installs not configured with TLS\n insecure: true\n accessKeySecret:\n name: mlpipeline-minio-artifact\n key: accesskey\n secretKeySecret:\n name: mlpipeline-minio-artifact\n key: secretkey\n","containerRuntimeExecutor":"docker","executor":"imagePullPolicy: IfNotPresent\nresources:\n requests:\n cpu: 0.01\n memory: 32Mi\n limits:\n cpu: 0.5\n memory: 512Mi\n"},"kind":"ConfigMap","metadata":{"annotations":{},"labels":{"application-crd-id":"kubeflow-pipelines"},"name":"workflow-controller-configmap","namespace":"kubeflow"}}
creationTimestamp: "2021-11-17T03:05:35Z"
labels:
application-crd-id: kubeflow-pipelines
name: workflow-controller-configmap
namespace: kubeflow
resourceVersion: "16984168"
selfLink: /api/v1/namespaces/kubeflow/configmaps/workflow-controller-configmap
uid: d0b5c55e-ed8a-4b37-835e-6a2b4d3c402c
Please help me. Many thanks!
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
Maybe it's because the @pipeline
decorator is missing?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.