kube-tasks icon indicating copy to clipboard operation
kube-tasks copied to clipboard

kube-task process stuck and pod never terminates

Open rumi-spock opened this issue 4 years ago • 11 comments

Hi

We are using kube-tasks to create Jenkins backup following the guidelines from Jenkins helm chart. Generally it woks great but I have noticed very often that our backup process gets stuck and process never exits leaving the pod in running state. Unless this stuck pod is deleted, no new backup job is triggered.

Another thing I noticed is, it always gets stuck when copying slave node files and it does throw an error

2020/06/19 02:30:35 error in Stream: command terminated with exit code 1  src: file: engineering/eng-jenkins-74775f7d68-285n8/jenkins/var/jenkins_home/nodes/jenkins.bugfix-ompl-1144.7-bq5dp-vbjn9/config.xml
2020/06/19 02:30:35 [011414/109652] done: k8s://engineering/eng-jenkins-74775f7d68-285n8/jenkins/var/jenkins_home/nodes/jenkins.bugfix-ompl-1144.7-bq5dp-vbjn9/config.xml -> s3://jenkins-engineering-tools-backup/20200619020006/nodes/jenkins.bugfix-ompl-1144.7-bq5dp-vbjn9/config.xml```

These are the last lines in logs.

One solution is to have an exclude paths option, so we pass another param with list of paths to be excluded.

rumi-spock avatar Jun 22 '20 16:06 rumi-spock

I just experienced the same error

cmcga1125 avatar Jul 21 '20 15:07 cmcga1125

I'm experiencing the same problem.

2020/10/08 16:02:56 error in Stream: command terminated with exit code 1 src: file: jenkins/jenkins-7cff7d695d-8k5h4/jenkins/var/jenkins_home/support/support_2020-10-08_12.53.03.zip

I've restarted the jenkins-backup pod and the backup process gets stuck again with a different file.

2020/10/09 16:43:34 error in Stream: command terminated with exit code 1 src: file: jenkins/jenkins-7cff7d695d-8k5h4/jenkins/var/jenkins_home/support/support_2020-10-09_12.59.03.zip

@rumi-spock How do you pass param with list of paths to be excluded?

victtsl avatar Oct 09 '20 21:10 victtsl

Seems this project has been adopted by @maorfr repo can be found here

taneishamitchell avatar Oct 10 '20 11:10 taneishamitchell

Unless I'm missing somethingn doesn't look like there's a way to submit issue to @maorfr repo.

victtsl avatar Oct 10 '20 21:10 victtsl

hello!

PRs are welcome!

maorfr avatar Oct 11 '20 07:10 maorfr

Hey @maorfr I also experience this issue and seems that the reason is files that being deleted due to jenkins job history lifecycle. I got the error on build number 13 which is not exist anymore, so I guess that since file was deleted backup job got stuck. Is there any way to avoid this behavior? I really wish to make this work instead of coming up with some workaround backup job (creating tar.gz for jenkins home dir and uploading to S3 on my own). If there is a possibility to add flag for skip_changed or something similar it can be very helpful. I believe that most people will prefer to not back up the changed/deleted files rather than losing the entire backup.

Thanks

YakobovLior avatar Nov 18 '21 09:11 YakobovLior

@All If this is of relevance for anyone of you guys.

Since this repo seems to be no longer maintained I created a fork by my own and added a flag that allows you to skip files that produces errors. For me the issue was that files got deleted while the backup job runs. Since the backup job gathers all files at the start, this leads to errors copying files and terminates the job.

Therefore I added the flag and errors are logged but the job keeps running.

If anyone is interested use the fork: https://github.com/sunoce/kube-tasks

You can find the docker image at: https://hub.docker.com/r/sunoce/kube-tasks/tags

If the maintainer sees this and wants me to create a PR comment here and I will create one.

EDIT:

A I missed the comment of @maorfr - So I will create a PR for this. For this I have to adjust the skbn module

EDIT:

I created the PRs: https://github.com/maorfr/kube-tasks/pull/6

sunoce avatar Apr 26 '22 14:04 sunoce

Hey @sunoce , Thank you for this fix. I am also experiencing the same in my back up job, so can it directly be used your docker image https://hub.docker.com/r/sunoce/kube-tasks/tags in the back up jobs as still your code is not merged in maofr repo.

Thanks for your reply in advance.

sudhirnikhade avatar Dec 06 '22 13:12 sudhirnikhade

Hey @sudhirnikhade

you can use the image. If you use the full repo you will have to use the fork aswell.

But I can also finish the Pull Request - I just forgot it.

Kind regards

sunoce avatar Dec 06 '22 14:12 sunoce

Hi @sunoce , We implemented this https://github.com/sunoce/kube-tasks and working. Thank you for that. But I have one query in it, can we use any option to exclude any folder/files while copying data to and from k8s to s3 bucket. For ex. builds folder in jenkins etc. Thanks for your time!!!!

Thank, Sudhir

sudhirnikhade avatar Dec 20 '22 13:12 sudhirnikhade

Hi,

I am using this image (https://github.com/amerello/kube-tasks) that solve the problem: "error in Stream: command terminated with exit code 1"

https://hub.docker.com/r/amerello/kube-tasks

It skips this kind of error and goes ahead, those files are empty.

Hope that helps!!

danielmorillas avatar Nov 16 '23 12:11 danielmorillas