k8up
k8up copied to clipboard
Jobs are stuck in terminating when using Kubernetes Job Tracking
Description
A lot of jobs do not complete, but hang with the "Terminating" state. I edit the job manually and set the finalizers: []
Additional Context
No response
Logs
No response
Expected Behavior
Jobs complete successfully and should not be stuck with the "Terminating" status.
Steps To Reproduce
No response
Version of K8up
v2.1.2
Version of Kubernetes
v1.22.6+k3s1
Distribution of Kubernetes
k3s
Hi. Can you please provide any logs, your specs and the steps to reproduce? It would also be helpful to post the YAML of the job before removing manually the finalizers. It's hard to help you if we don't have the relevant information and can't reproduce the error ourselves. Thank you :)
Hi @vring0
Could you please provide the information requested by ccremer? Otherwise it's not possible for us to help you with the issues you're having.
I am currently on vacation. My problem is real. I will write on my return.
I changed the IP addresses in the variables to 127.0.0.1
Terminaiting Job Yaml
apiVersion: batch/v1
kind: Job
metadata:
annotations:
batch.kubernetes.io/job-tracking: ""
creationTimestamp: "2022-07-22T04:00:03Z"
deletionGracePeriodSeconds: 0
deletionTimestamp: "2022-07-22T05:00:17Z"
finalizers:
- foregroundDeletion
generation: 2
labels:
k8up.io/type: backup
k8upjob: "true"
name: backup-schedule-backup-test123-backup-ptzxv
namespace: test123
ownerReferences:
- apiVersion: k8up.io/v1
blockOwnerDeletion: true
controller: true
kind: Backup
name: schedule-backup-test123-backup-ptzxv
uid: 58f7adbf-13b6-4aa2-b19b-43568ca3832d
resourceVersion: "18690552"
uid: 057b0203-b64f-4f15-8b0b-0b6211a12a6a
spec:
backoffLimit: 6
completionMode: NonIndexed
completions: 1
parallelism: 1
selector:
matchLabels:
controller-uid: 057b0203-b64f-4f15-8b0b-0b6211a12a6a
suspend: false
template:
metadata:
creationTimestamp: null
labels:
controller-uid: 057b0203-b64f-4f15-8b0b-0b6211a12a6a
job-name: backup-schedule-backup-test123-backup-ptzxv
k8upjob: "true"
spec:
containers:
- command:
- /usr/local/bin/k8up
- restic
env:
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
key: AWS_ACCESS_KEY_ID
name: aws-credentials
- name: STATS_URL
- name: FILEEXTENSION_ANNOTATION
value: k8up.io/file-extension
- name: PROM_URL
value: http://127.0.0.1/
- name: BACKUPCOMMAND_ANNOTATION
value: k8up.io/backupcommand
- name: HOSTNAME
value: test123
- name: RESTIC_PASSWORD
valueFrom:
secretKeyRef:
key: K8UP_PASSWORD
name: backup-repo
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
key: AWS_SECRET_ACCESS_KEY
name: aws-credentials
- name: RESTIC_REPOSITORY
value: s3:http://127.0.0.1:9000/test123-master
image: ghcr.io/k8up-io/k8up:v2.1.2
imagePullPolicy: IfNotPresent
name: backup
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /data/test123-claim
name: test123-claim
readOnly: true
dnsPolicy: ClusterFirst
restartPolicy: OnFailure
schedulerName: default-scheduler
securityContext:
runAsUser: 13001
serviceAccount: pod-executor
serviceAccountName: pod-executor
terminationGracePeriodSeconds: 30
volumes:
- name: test123-claim
persistentVolumeClaim:
claimName: test123-claim
status:
completionTime: "2022-07-22T04:00:17Z"
conditions:
- lastProbeTime: "2022-07-22T04:00:17Z"
lastTransitionTime: "2022-07-22T04:00:17Z"
status: "True"
type: Complete
startTime: "2022-07-22T04:00:03Z"
succeeded: 1
uncountedTerminatedPods: {}
Terminaiting Pod Yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2022-07-22T04:00:03Z"
deletionGracePeriodSeconds: 0
deletionTimestamp: "2022-07-22T05:00:17Z"
finalizers:
- batch.kubernetes.io/job-tracking
generateName: backup-schedule-backup-test123-backup-ptzxv--1-
labels:
controller-uid: 057b0203-b64f-4f15-8b0b-0b6211a12a6a
job-name: backup-schedule-backup-test123-backup-ptzxv
k8upjob: "true"
name: backup-schedule-backup-test123-backup-ptzxv--1-lfv7n
namespace: test123
ownerReferences:
- apiVersion: batch/v1
blockOwnerDeletion: true
controller: true
kind: Job
name: backup-schedule-backup-test123-backup-ptzxv
uid: 057b0203-b64f-4f15-8b0b-0b6211a12a6a
resourceVersion: "18690553"
uid: 14227250-376b-410c-bf5c-d00996402a39
spec:
containers:
- command:
- /usr/local/bin/k8up
- restic
env:
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
key: AWS_ACCESS_KEY_ID
name: aws-credentials
- name: STATS_URL
- name: FILEEXTENSION_ANNOTATION
value: k8up.io/file-extension
- name: PROM_URL
value: http://127.0.0.1/
- name: BACKUPCOMMAND_ANNOTATION
value: k8up.io/backupcommand
- name: HOSTNAME
value: test123
- name: RESTIC_PASSWORD
valueFrom:
secretKeyRef:
key: K8UP_PASSWORD
name: backup-repo
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
key: AWS_SECRET_ACCESS_KEY
name: aws-credentials
- name: RESTIC_REPOSITORY
value: s3:http://127.0.0.1:9000/test123-master
image: ghcr.io/k8up-io/k8up:v2.1.2
imagePullPolicy: IfNotPresent
name: backup
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /data/test123-claim
name: test123-claim
readOnly: true
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-xftnb
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: prod-2
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: OnFailure
schedulerName: default-scheduler
securityContext:
runAsUser: 13001
serviceAccount: pod-executor
serviceAccountName: pod-executor
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: test123-claim
persistentVolumeClaim:
claimName: test123-claim
- name: kube-api-access-xftnb
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2022-07-22T04:00:03Z"
reason: PodCompleted
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2022-07-22T04:00:17Z"
reason: PodCompleted
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2022-07-22T04:00:17Z"
reason: PodCompleted
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2022-07-22T04:00:03Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: containerd://f3fe195b3f9ef93cb30d386f1294ec293732aa29bb8fbc16d7c5ca324afc3510
image: ghcr.io/k8up-io/k8up:v2.1.2
imageID: ghcr.io/k8up-io/k8up@sha256:533b4c016d392fd70b52bae69d6321da46d41af01e6d8095da1f2da7610c0c0c
lastState: {}
name: backup
ready: false
restartCount: 0
started: false
state:
terminated:
containerID: containerd://f3fe195b3f9ef93cb30d386f1294ec293732aa29bb8fbc16d7c5ca324afc3510
exitCode: 0
finishedAt: "2022-07-22T04:00:16Z"
reason: Completed
startedAt: "2022-07-22T04:00:04Z"
hostIP: 142.91.9.214
phase: Succeeded
podIP: 10.42.2.58
podIPs:
- ip: 10.42.2.58
qosClass: BestEffort
startTime: "2022-07-22T04:00:03Z"
Logs backup Pod
1.658462404121403e+09 INFO k8up Starting k8up… {"version": "2.1.2", "date": "2022-03-30T11:52:11Z", "commit": "18c6c8c325db35068327533906b158ea29ace4ae", "go_os": "linux", "go_arch": "amd64", "go_version": "go1.17.7", "uid": 13001, "gid": 0}
1.658462404121761e+09 INFO k8up.restic initializing
1.6584624041217833e+09 INFO k8up.restic setting up a signal handler
1.658462404122068e+09 INFO k8up.restic.restic using the following restic options {"options": [""]}
1.6584624041222034e+09 INFO k8up.restic.restic.RepoInit.command restic command {"path": "/usr/local/bin/restic", "args": ["init", "--option", ""]}
1.6584624041222608e+09 INFO k8up.restic.restic.RepoInit.command Defining RESTIC_PROGRESS_FPS {"frequency": 0.016666666666666666}
1.658462404450951e+09 INFO k8up.restic.restic.unlock unlocking repository {"all": false}
1.6584624044510314e+09 INFO k8up.restic.restic.unlock.command restic command {"path": "/usr/local/bin/restic", "args": ["unlock", "--option", ""]}
1.658462404451058e+09 INFO k8up.restic.restic.unlock.command Defining RESTIC_PROGRESS_FPS {"frequency": 0.016666666666666666}
1.658462405552401e+09 INFO k8up.restic.restic.unlock.restic.stderr successfully removed locks
1.6584624055558498e+09 INFO k8up.restic.restic.snapshots getting list of snapshots
1.6584624055560563e+09 INFO k8up.restic.restic.snapshots.command restic command {"path": "/usr/local/bin/restic", "args": ["snapshots", "--option", "", "--json"]}
1.6584624055561106e+09 INFO k8up.restic.restic.snapshots.command Defining RESTIC_PROGRESS_FPS {"frequency": 0.016666666666666666}
1.6584624097351136e+09 INFO k8up.restic.k8sClient listing all pods {"annotation": "k8up.io/backupcommand", "namespace": "test123"}
1.658462409842674e+09 INFO k8up.restic backups of annotated jobs have finished successfully
1.6584624098427212e+09 INFO k8up.restic.restic.backup starting backup
1.6584624098428273e+09 INFO k8up.restic.restic.backup starting backup for folder {"foldername": "test123-claim"}
1.6584624098428476e+09 INFO k8up.restic.restic.backup.command restic command {"path": "/usr/local/bin/restic", "args": ["backup", "--option", "", "--json", "--host", "test123", "/data/test123-claim"]}
1.6584624098428583e+09 INFO k8up.restic.restic.backup.command Defining RESTIC_PROGRESS_FPS {"frequency": 0.016666666666666666}
1.6584624139119163e+09 INFO k8up.restic.restic.backup.progress progress of backup {"percentage": "0.00%"}
1.658462414998352e+09 INFO k8up.restic.restic.backup.progress backup finished {"new files": 0, "changed files": 1, "errors": 0}
1.6584624149984374e+09 INFO k8up.restic.restic.backup.progress stats {"time": 4.297125158, "bytes added": 4319383, "bytes processed": 93481350}
1.6584624149986672e+09 INFO k8up.restic.statsHandler.promStats sending prometheus stats {"url": "http://127.0.0.1/"}
1.6584624150027742e+09 ERROR k8up.restic.restic prometheus send failed {"error": "Post \"http://127.0.0.1/metrics/job/restic_backup/instance/test123\": dial tcp 127.0.0.1:80: connect: connection refused"}
github.com/k8up-io/k8up/restic/logging.(*BackupOutputParser).out
/home/runner/work/k8up/k8up/restic/logging/logging.go:162
github.com/k8up-io/k8up/restic/logging.writer.Write
/home/runner/work/k8up/k8up/restic/logging/logging.go:103
io.copyBuffer
/opt/hostedtoolcache/go/1.17.7/x64/src/io/io.go:425
io.Copy
/opt/hostedtoolcache/go/1.17.7/x64/src/io/io.go:382
os/exec.(*Cmd).writerDescriptor.func1
/opt/hostedtoolcache/go/1.17.7/x64/src/os/exec/exec.go:311
os/exec.(*Cmd).Start.func1
/opt/hostedtoolcache/go/1.17.7/x64/src/os/exec/exec.go:441
1.6584624150521045e+09 INFO k8up.restic.restic.backup backup finished, sending snapshot list
1.658462415052187e+09 INFO k8up.restic.restic.snapshots getting list of snapshots
1.658462415052233e+09 INFO k8up.restic.restic.snapshots.command restic command {"path": "/usr/local/bin/restic", "args": ["snapshots", "--option", "", "--json"]}
1.658462415052272e+09 INFO k8up.restic.restic.snapshots.command Defining RESTIC_PROGRESS_FPS {"frequency": 0.016666666666666666}
Hi. I see you have enabled the Job tracking feature gate on the cluster (https://kubernetes.io/docs/concepts/workloads/controllers/job/#job-tracking-with-finalizers) This is a beta feature of Kubernetes 1.23 (Alpha in K8s 1.22) and we have already the same issue discussed here: https://github.com/k8up-io/k8up/issues/554
In short, I believe this is an issue on Kubernetes' side, not in K8up. K8up doesn't add this finalizer.
@vring0
Do you still encounter this issue with Kubernetes 1.23+? According to https://github.com/k8up-io/k8up/issues/554 it seems to work with 1.23.
@vring0
Вы все еще сталкиваетесь с этой проблемой в Kubernetes 1.23+? Судя по #554 , вроде работает с 1.23.
Not yet, thx!