kfp-tekton icon indicating copy to clipboard operation
kfp-tekton copied to clipboard

push_artifact does not push output artifacts to s3 in copy-artifacts step

Open HumairAK opened this issue 1 year ago • 3 comments

/kind bug

What steps did you take and what happened:

When using data passing via kfp.components.OutputPath(), kfp.components.InputPath(), we notice that artifacts never make it in to s3 storage, instead we see a 0kb file named after the output files.

What did you expect to happen: Output artifacts show up in s3 storage.

Additional information:

When reproducing it please use a separate backing storage for pvc than the s3 solution for apiserver.

Take for example:

  1. Pipeline in kfp dsl
  2. Same Pipeline after it's fed through kfp-tetkon compiler
  3. Same Pipeline after it is adjusted by apiserver, post submit

Once the Pipeline in (1) makes it to api-server (3), we see new task steps added to manage artifact passing/tracking. The final step added by api server is copy-artifacts, this step pushes the artifacts in this task to s3 storage via the push_artifacts script. The problem we are seeing is that when the artifact is >4kb, this fails.

This step expects the artifact to be in /tekton/home/tep-results, but what you find there is just a file of the artifact output name that is 0kb. This occurs because copy-results-artifacts does not copy the artifact to /tekton/home since it's too big >3072 bytes:

if [ -d /tekton/results ]; then mkdir -p /tekton/home/tep-results; mv /tekton/results/* /tekton/home/tep-results/ || true; fi

this seems to take /tekton/results/ and send it to /tekton/home , from the preceding step copy-results-artifacts we see:

 copy_artifact $(workspaces.produce-output.path)/artifacts/simple-pipeline-fe138/$(context.taskRun.name)/mydestfile $(results.mydestfile.path)

So we're expecting contents in the /workspaceto move to /tekton/results so it can be moved to /tekton/home in the next step.

But when the pipeline is fed through compiler in (2) above, we see that the script in copy-results-artifacts that is added will only move contents of /workspace here, if it's <3072 bytes. (Makes sense because we have to maintain a <4kb to avoid the termination error messages right?)

And since this file is ~20MB that doesn't happen, and instead we end up with the empty file created here instead, and this ends up trickling in to push_artifact here.

We noticed that simply fetching the push_artifact output artifact path arguments from the paths stored in tekton.dev/artifact_items seemed to work, example here. Which could maybe be a trivial change, I'm not sure if it's accounting for everything though.

As a workaround we are looking to using a custom push_artifact script that will look for the artifact in workspaces (if it exists) then push this path to s3.

Environment:

  • SDK Version: 1.5.1
  • Tekton Version (use tkn version): 0.47.x
  • Kubernetes Version (use kubectl version): 1.25

HumairAK avatar Jul 18 '23 19:07 HumairAK

We notice the same behavior when not using .add_pod_annotation() for pipelines that use data passing. Example.

HumairAK avatar Jul 18 '23 19:07 HumairAK

We notice the same behavior when not using .add_pod_annotation() for pipelines that use data passing

it's a bit nuanced. What I've seen is:

  • if I have a 2-step pipeline ... step1 with an output, and step2 with an input and and output ->
    • if I leave off the artifact_outputs annotation on step2, step2's artifact gets uploaded to minio, but step2 goes to failure state with the message Error while handling results: Termination message is above max allowed size 4096. Example
    • if I include the artifact_outputs annotation on step2, step2's uploads a 0-byte tgz archive to minio, and step2 shows success. Example

I can't get both a success state and a successful upload at the same time.

gregsheremeta avatar Jul 19 '23 19:07 gregsheremeta

@gregsheremeta this is because, in your example you'll notice that the artifact output gets moved to /tekton/results.

Since push_artifact is pushing everything in /tekton/results (via /tekton/home -> /tekton/results in copy-results-artifacts), thus in this case the artifact will get pushed to s3. But because now we have the /tekton/results containing a file >4kb, we get the termination error message.

HumairAK avatar Jul 19 '23 19:07 HumairAK