kubeflow icon indicating copy to clipboard operation
kubeflow copied to clipboard

Kubeflow minio pod showing "Error: disk path full"

Open shubamsharma opened this issue 2 years ago • 2 comments

/kind question

Question: Kubeflow minio pod showing "Error: disk path full". Do we have any purging strategy to handle pipeline artifact deletion.

Details: POD - [root@master ~]# kubectl get po -n kubeflow | grep minio minio-5579d6498b-87r7m 2/2 Running 2 6d [root@master ~]#

wait container logs:

time="2022-07-11T06:02:23.468Z" level=info msg="Saving from /tmp/argo/outputs/logs/main.log to s3 (endpoint: minio-service.kubeflow:9000, bucket: mlpipeline, key: artifacts/XXXX-pipeline-6x54b/XXXX-pipeline-6x54b-2106946359/main.log)" time="2022-07-11T06:02:23.471Z" level=warning msg="Failed to put file: Storage backend has reached its minimum free disk threshold. Please delete a few objects to proceed." time="2022-07-11T06:02:39.821Z" level=info msg="S3 Save path: /tmp/argo/outputs/logs/main.log, key: artifacts/XXXX-pipeline-6x54b/XXXX-pipeline-6x54b-2106946359/main.log" time="2022-07-11T06:02:39.821Z" level=info msg="Creating minio client minio-service.kubeflow:9000 using static credentials" time="2022-07-11T06:02:39.821Z" level=info msg="Saving from /tmp/argo/outputs/logs/main.log to s3 (endpoint: minio-service.kubeflow:9000, bucket: mlpipeline, key: artifacts/XXXX-pipeline-6x54b/XXXX-pipeline-6x54b-2106946359/main.log)" time="2022-07-11T06:02:39.827Z" level=warning msg="Failed to put file: Storage backend has reached its minimum free disk threshold. Please delete a few objects to proceed." time="2022-07-11T06:02:39.827Z" level=error msg="executor error: timed out waiting for the condition" time="2022-07-11T06:02:39.827Z" level=info msg="Killing sidecars" time="2022-07-11T06:02:39.831Z" level=info msg="Killing sidecar echo (25c2c53996b4f1facef84c76106e76cf5ec2caef53a713463da8f513165d654e)" time="2022-07-11T06:02:39.831Z" level=info msg="Killing containers [25c2c53996b4f1facef84c76106e76cf5ec2caef53a713463da8f513165d654e]" time="2022-07-11T06:02:39.831Z" level=info msg="SIGTERM containerID "25c2c53996b4f1facef84c76106e76cf5ec2caef53a713463da8f513165d654e": terminated" time="2022-07-11T06:02:39.834Z" level=info msg="https://172.30.0.1:443/api/v1/namespaces/kubeflow/pods/XXXX-pipeline-6x54b-2106946359/exec?command=%2Fbin%2Fsh&command=-c&command=kill+-15+1&container=echo&stderr=true&stdout=false&tty=false" time="2022-07-11T06:03:09.858Z" level=error msg="executor error: Timeout occurred\ngithub.com/argoproj/argo/v2/errors.Wrap\n\t/go/src/github.com/argoproj/argo/errors/errors.go:88\ngithub.com/argoproj/argo/v2/errors.InternalWrapError\n\t/go/src/github.com/argoproj/argo/errors/errors.go:71\ngithub.com/argoproj/argo/v2/workflow/common.GetExecutorOutput\n\t/go/src/github.com/argoproj/argo/workflow/common/util.go:240\ngithub.com/argoproj/argo/v2/workflow/executor/k8sapi.(*k8sAPIClient).KillContainer\n\t/go/src/github.com/argoproj/argo/workflow/executor/k8sapi/client.go:87\ngithub.com/argoproj/argo/v2/workflow/executor/common.TerminatePodWithContainerID\n\t/go/src/github.com/argoproj/argo/workflow/executor/common/common.go:92\ngithub.com/argoproj/argo/v2/workflow/executor/common.KillGracefully\n\t/go/src/github.com/argoproj/argo/workflow/executor/common/common.go:98\ngithub.com/argoproj/argo/v2/workflow/executor/k8sapi.(*k8sAPIClient).killGracefully\n\t/go/src/github.com/argoproj/argo/workflow/executor/k8sapi/client.go:92\ngithub.com/argoproj/argo/v2/workflow/executor/k8sapi.(*K8sAPIExecutor).Kill\n\t/go/src/github.com/argoproj/argo/workflow/executor/k8sapi/k8sapi.go:71\ngithub.com/argoproj/argo/v2/workflow/executor.(*WorkflowExecutor).KillSidecars\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:1268\ngithub.com/argoproj/argo/v2/cmd/argoexec/commands.waitContainer.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:33\ngithub.com/argoproj/argo/v2/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:62\ngithub.com/argoproj/argo/v2/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:846\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:950\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/[email protected]/command.go:887\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:13\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357" time="2022-07-11T06:03:09.858Z" level=info msg="Alloc=6961 TotalAlloc=20663 Sys=71360 NumGC=7 Goroutines=15" time="2022-07-11T06:03:09.880Z" level=fatal msg="timed out waiting for the condition"

minio pod logs:

API: PutObject(bucket=mlpipeline, object=artifacts/XXXX-pipeline-6x54b/XXXX-pipeline-6x54b-2106946359/main.log) Time: 06:02:23 UTC 07/11/2022 DeploymentID: 97f93caf-2d5d-4b5c-bbe6-a8aa0be1be70 RequestID: 1700B10F9BED7235 RemoteHost: 10.128.0.213 Host: minio-service.kubeflow:9000 UserAgent: MinIO (linux; amd64) minio-go/v7.0.2 Error: disk path full 5: cmd/fs-v1-helpers.go:320:cmd.fsCreateFile() 4: cmd/fs-v1.go:903:cmd.(*FSObjects).putObject() 3: cmd/fs-v1.go:817:cmd.(*FSObjects).PutObject() 2: cmd/object-handlers.go:1255:cmd.objectAPIHandlers.PutObjectHandler() 1: net/http/server.go:1995:http.HandlerFunc.ServeHTTP()

API: PutObject(bucket=mlpipeline, object=artifacts/XXXX-pipeline-6x54b/XXXX-pipeline-6x54b-2106946359/main.log) Time: 06:02:39 UTC 07/11/2022 DeploymentID: 97f93caf-2d5d-4b5c-bbe6-a8aa0be1be70 RequestID: 1700B1136AC65A77 RemoteHost: 10.128.0.213 Host: minio-service.kubeflow:9000 UserAgent: MinIO (linux; amd64) minio-go/v7.0.2 Error: disk path full 5: cmd/fs-v1-helpers.go:320:cmd.fsCreateFile() 4: cmd/fs-v1.go:903:cmd.(*FSObjects).putObject() 3: cmd/fs-v1.go:817:cmd.(*FSObjects).PutObject() 2: cmd/object-handlers.go:1255:cmd.objectAPIHandlers.PutObjectHandler() 1: net/http/server.go:1995:http.HandlerFunc.ServeHTTP()

shubamsharma avatar Jul 11 '22 08:07 shubamsharma

@shubamsharma, think you might need to check https://docs.arrikto.com/ops/kubeflow/minio.html. Hope this helps

shan100github avatar Jul 20 '22 14:07 shan100github

@shan100github, Thanks for the reply.

But above information is more sort of manual approach. Is their any way to enable purging of pipeline artifacts.

shubamsharma avatar Jul 25 '22 11:07 shubamsharma

/close

There has been no activity for a long time. Please reopen if necessary.

juliusvonkohout avatar Aug 25 '23 10:08 juliusvonkohout

@juliusvonkohout: Closing this issue.

In response to this:

/close

There has been no activity for a long time. Please reopen if necessary.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

google-oss-prow[bot] avatar Aug 25 '23 10:08 google-oss-prow[bot]