pipelines
pipelines copied to clipboard
[sdk] enable_caching breaks when using CreatePVC: must specify FingerPrint
Environment
- KFP version:
2.0.3
(manifests v1.8 release) - KFP SDK version:
kfp 2.4.0
kfp-kubernetes 1.0.0
kfp-pipeline-spec 0.2.2
kfp-server-api 2.0.3
Steps to reproduce
Given the following example:
from kfp import dsl
from kfp import kubernetes
@dsl.component
def test_step():
print("Hello world")
@dsl.pipeline
def test_pipeline():
kubernetes.CreatePVC(
access_modes=["ReadWriteOnce"],
size="10Mi",
storage_class_name="default",
)
test_step()
client.create_run_from_pipeline_func(test_pipeline, arguments={}, enable_caching=False)
The pipeline will fail. Note the enable_caching
, which will cause the issue when set to False.
We will see an error in the created PVC step:
F1031 14:29:54.216337 27 main.go:76] KFP driver: driver.Container(pipelineName=test-pipeline, runID=02ad61d6-8b9b-47a7-b626-0d65f3838b42, task="createpvc", component="comp-createpvc", dagExecutionID=9094, componentSpec) failed: failed to create PVC and publish execution createpvc: failed to create cache entrty for create pvc: failed to create task: rpc error: code = InvalidArgument desc = Failed to create a new task due to validation error: Invalid input error: Invalid task: must specify FingerPrint
time="2023-10-31T14:29:54.940Z" level=info msg="sub-process exited" argo=true error="<nil>"
time="2023-10-31T14:29:54.940Z" level=error msg="cannot save parameter /tmp/outputs/pod-spec-patch" argo=true error="open /tmp/outputs/pod-spec-patch: no such file or directory"
time="2023-10-31T14:29:54.940Z" level=error msg="cannot save parameter /tmp/outputs/cached-decision" argo=true error="open /tmp/outputs/cached-decision: no such file or directory"
time="2023-10-31T14:29:54.940Z" level=error msg="cannot save parameter /tmp/outputs/condition" argo=true error="open /tmp/outputs/condition: no such file or directory"
Error: exit status 1
Impacted by this bug? Give it a 👍.
@TobiasGoerke what is the version of your KFP runtime? Maybe there is a bug when resolving cache key in the PVC creation operation. cc @chensun to learn more.
@TobiasGoerke what is the version of your KFP runtime? Maybe there is a bug when resolving cache key in the PVC creation operation. cc @chensun to learn more.
I'm on manifests/v1.8-branch, i.e. 2.0.3
.
I am also facing the exactly same issue with the same output on KFP backend 2.0.3
with Kubeflow 1.8.0 manifests deployment.
The PVC is created, but the component reported the error from the logs and exist with error.
F1117 21:35:33.015147 22 main.go:76] KFP driver: driver.Container(pipelineName=my-pipeline, runID=cd147529-1b6c-454b-b3e1-b2858ff98222, task="createpvc", component="comp-createpvc", dagExecutionID=29, componentSpec) failed: failed to create PVC and publish execution createpvc: failed to create cache entrty for create pvc: failed to create task: rpc error: code = InvalidArgument desc = Failed to create a new task due to validation error: Invalid input error: Invalid task: must specify FingerPrint
time="2023-11-17T21:35:33.321Z" level=info msg="sub-process exited" argo=true error="<nil>"
time="2023-11-17T21:35:33.322Z" level=error msg="cannot save parameter /tmp/outputs/pod-spec-patch" argo=true error="open /tmp/outputs/pod-spec-patch: no such file or directory"
time="2023-11-17T21:35:33.322Z" level=error msg="cannot save parameter /tmp/outputs/cached-decision" argo=true error="open /tmp/outputs/cached-decision: no such file or directory"
time="2023-11-17T21:35:33.322Z" level=error msg="cannot save parameter /tmp/outputs/condition" argo=true error="open /tmp/outputs/condition: no such file or directory"
Error: exit status 1
Just want to add some additional info. After experiencing this issue, kfp backend didn't work anymore in my case.
I have to restart all the deployments kubectl -n kubeflow rollout restart deployments
to be able to run v2 pipeline again.
With the api-server 2.0.5 with enable_caching=False
, this issue still exists.
- KFP Backend API-SERVER version:
2.0.5
(manifests v1.8 release modified) - KFP SDK version:
kfp 2.4.0
kfp-kubernetes 1.0.0
kfp-pipeline-spec 0.2.2
kfp-server-api 2.0.5
With the api-server 2.0.5 with
enable_caching=False
, this issue still exists.
- KFP Backend API-SERVER version:
2.0.5
(manifests v1.8 release modified)- KFP SDK version:
kfp 2.4.0 kfp-kubernetes 1.0.0 kfp-pipeline-spec 0.2.2 kfp-server-api 2.0.5
@yingding finally, it's working fine?
@kabartay Unfortunately, this issue still exists, even with
- KFP Backend API-SERVER version:
2.0.5
(manifests v1.8 release modified) - KFP SDK version:
kfp 2.6.0
kfp-kubernetes 1.1.0
kfp-pipeline-spec 0.3.0
kfp-server-api 2.0.5
Hopefully, it can be resolved in the next KFP backend API SERVER.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.
/reopen
Seems this issue has not been resolved, yet.
@AnnKatrinBecker: You can't reopen an issue/PR unless you authored it or you are a collaborator.
In response to this:
/reopen
Seems this issue has not been resolved, yet.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
/reopen
@HumairAK: Reopened this issue.
In response to this:
/reopen
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.