skbn icon indicating copy to clipboard operation
skbn copied to clipboard

Files failing to copy does not exit the process

Open Tim020 opened this issue 4 years ago • 4 comments

I am not sure if this is due to using the parallelism setting or not, however if a file fails to copy (in our case from K8S to AWS) then the parent process is not terminated. This leads to copy jobs sat spinning making no progress for days at a time. The desired behaviour here is that, should a file fail to copy, the entire process should exit with a non-0 status code to allow the copy to be retried if desired.

Tim020 avatar Apr 04 '20 13:04 Tim020

For example, we saw this error last night:

2020/04/04 17:23:36 error in Stream: error dialing backend: dial tcp ***.***.***.***:10250: connect: connection refused src: file: default/jenkins-8fb6578f5-59vsm/jenkins/var/jenkins_home/jobs/Supportal/jobs/Post Merge/jobs/supportal/branches/master/builds/16/workflow/3.xml

And the backup job was sat running since then. In such scenarios, everything should terminate.

Tim020 avatar Apr 05 '20 10:04 Tim020

Hi @Tim020 I am experiencing the same problem with copying to GCS. Have you solved this problem? If so, how? If not - where did you see this error? In my case the copy process just hangs forever without any error output at all.

rrusmana avatar May 18 '20 06:05 rrusmana

Hey @rrusmana, unfortunately I haven't been able to solve this. I see this error when copying from K8s to AWS, it encounters an error like the one above and then hangs indefinitely.

Tim020 avatar May 21 '20 09:05 Tim020

We are seeing the similar problem in our Jenkins backup job.

dex80526 avatar Sep 23 '20 20:09 dex80526