k3s icon indicating copy to clipboard operation
k3s copied to clipboard

k3s: Error syncing job: rpc error: code = Unknown desc = database is locked

Open decipher27 opened this issue 2 years ago • 16 comments

Production Environment: K3s Version: k3s:v1.21.4-k3s1

Node(s) CPU architecture, OS, and Version: Linux ip-10-151-40-235.ec2.internal 5.4.188-104.359.amzn2.x86_64 #1 SMP Thu Apr 14 20:53:13 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration: Eks clusters having 4 spot nodes.

Describe the bug: We noticed our workflows are failing due to database lock failure and found this to be happening due to k3s. Would be great if we get a solution.

Steps To Reproduce: Not sure on how to reproduce this. Here, we are deploying our clusters via Loft (https://loft.sh/).

A virtual Kubernetes cluster in Loft is tied to a single namespace. The virtual cluster and hypervisor run within a single pod that consists of two parts:

  1. A k3s instance which contains the Kubernetes control plane (API server, controller manager and etcd) for the virtual cluster
  2. An instance of a virtual cluster hypervisor which is mainly responsible for syncing cluster resources between the k3s powered virtual cluster and the underlying host cluster. (Ref: https://loft.sh/docs/1.15/self-service/virtual-clusters)

Please find the k3s logs below:

E0812 10:58:56.903092       7 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http: Handler timeout"}: http: Handler timeout
time="2022-08-12T10:59:00.173157384Z" level=error msg="error in txn: context canceled"
E0812 11:00:01.772938       7 cronjob_controllerv2.go:154] error syncing CronJobController atlas/atlas-statsd-job, requeuing: Operation cannot be fulfilled on cronjobs.batch "atlas-statsd-job": the object has been modified; please apply your changes to the latest version and try again
E0812 11:00:06.640284       7 cronjob_controllerv2.go:154] error syncing CronJobController monitoring/prometheus-health-monitor, requeuing: Operation cannot be fulfilled on cronjobs.batch "prometheus-health-monitor": the object has been modified; please apply your changes to the latest version and try again
time="2022-08-12T11:00:46.209517234Z" level=error msg="error in txn: database is locked"
E0812 11:00:46.209711       7 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"database is locked", Details:[]*anypb.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}: rpc error: code = Unknown desc = database is locked
E0812 11:03:57.679420       7 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http: Handler timeout"}: http: Handler timeout
time="2022-08-12T11:03:57.815170693Z" level=error msg="error in txn: context canceled"
time="2022-08-12T11:04:01.362475215Z" level=error msg="Compact failed: failed to compact to revision 48700823: database is locked"
E0812 11:10:00.129181       7 cronjob_controllerv2.go:154] error syncing CronJobController atlas/atlas-statsd-job, requeuing: Operation cannot be fulfilled on cronjobs.batch "atlas-statsd-job": the object has been modified; please apply your changes to the latest version and try again
E0812 11:15:00.143464       7 cronjob_controllerv2.go:154] error syncing CronJobController monitoring/prometheus-health-monitor, requeuing: Operation cannot be fulfilled on cronjobs.batch "prometheus-health-monitor": the object has been modified; please apply your changes to the latest version and try again
E0812 11:18:06.399401       7 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http: Handler timeout"}: http: Handler timeout
E0812 11:20:00.138191       7 cronjob_controllerv2.go:154] error syncing CronJobController atlas/atlas-statsd-job, requeuing: Operation cannot be fulfilled on cronjobs.batch "atlas-statsd-job": the object has been modified; please apply your changes to the latest version and try again
time="2022-08-12T11:29:01.370345967Z" level=error msg="Compact failed: failed to compact to revision 48724650: database is locked"
E0812 11:29:10.846344       7 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http: Handler timeout"}: http: Handler timeout
E0812 11:30:02.035306       7 cronjob_controllerv2.go:154] error syncing CronJobController atlas/atlas-statsd-job, requeuing: Operation cannot be fulfilled on cronjobs.batch "atlas-statsd-job": the object has been modified; please apply your changes to the latest version and try again
E0812 11:30:04.198513       7 cronjob_controllerv2.go:154] error syncing CronJobController monitoring/prometheus-health-monitor, requeuing: Operation cannot be fulfilled on cronjobs.batch "prometheus-health-monitor": the object has been modified; please apply your changes to the latest version and try again
time="2022-08-12T11:30:08.967942410Z" level=error msg="error in txn: database is locked"
E0812 11:30:08.968146       7 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"database is locked", Details:[]*anypb.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}: rpc error: code = Unknown desc = database is locked
E0812 11:30:08.968653       7 job_controller.go:866] rpc error: code = Unknown desc = database is locked
I0812 11:30:08.968741       7 event.go:291] "Event occurred" object="monitoring/prometheus-health-monitor-27671730" kind="Job" apiVersion="batch/v1" type="Warning" reason="FailedCreate" message="Error creating: rpc error: code = Unknown desc = database is locked"
E0812 11:30:08.969184       7 job_controller.go:404] Error syncing job: rpc error: code = Unknown desc = database is locked

decipher27 avatar Aug 12 '22 12:08 decipher27

The "database is locked" error indicates that sqlite's locks are running into conflicts, which is not something that should happen unless something strange is going on.

What is the backing filesystem for the K3s container? The only time I've ever seen this is when using k3s with kine/sqlite on a backing filesystem with poor locking semantics - NFS in particular, or when trying to run multiple copies of K3s against the same local filesystem.

brandond avatar Aug 15 '22 19:08 brandond

We are using file system as EBS volume and mounting it as pvc

decipher27 avatar Aug 16 '22 17:08 decipher27

What is the filesystem on the EBS volume, and what is the volume type (gp2, st1, so on)? Are you passing K3s any additional datastore options that might change the sqlite locking semantics?

Also, can you reproduce this on more recent versions of k3s? The 1.21 branch is end of life.

brandond avatar Aug 16 '22 17:08 brandond

Filesystem on EBS Volume: ext4 Volume Type: gp2 Not using any additional datastore options for k3s. You mean should we try upgrading to v1.24.3+k3s1 and test if this is happening?

decipher27 avatar Aug 16 '22 18:08 decipher27

I would give that a try, although I regularly run K3s on EC2 instances with gp2 volumes and don't have any issues, so I suspect perhaps something else is going on. Can you provide the full yaml output of the k3s pod spec you're using?

brandond avatar Aug 16 '22 19:08 brandond

vCluster Loft: https://loft.sh/docs/getting-started/explore/virtual-clusters We are using vcluster template and we use the k3s image in it. Do you want the yaml output of that? @brandond

decipher27 avatar Aug 26 '22 13:08 decipher27

Does that end up running k3s directly in a pod? If so then yes.

brandond avatar Aug 29 '22 16:08 brandond

Please find file attached. vcluster-pod.txt

decipher27 avatar Aug 29 '22 16:08 decipher27

Can you show the volume/pvc for this?

  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: data-onboarding-test15-0

brandond avatar Aug 29 '22 16:08 brandond

❯ k get pvc data-onboarding-test15-0 -o yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
    volume.kubernetes.io/selected-node: <node-ip.xxxx>
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    app: vcluster
    release: onboarding-test15
  name: data-onboarding-test15-0
  namespace: vcluster-onboarding-test15
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi
  storageClassName: gp2
  volumeMode: Filesystem
  volumeName: pvc-fdfa68be-0984-4f84-a484-abe45f1bb658
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 50Gi
  phase: Bound

decipher27 avatar Aug 29 '22 16:08 decipher27

Hmm, that all looks pretty standard. How much load are you putting on that vcluster? The way GP2 throughput provisioning works, at 50GB you're only guaranteed 150 IOPS; is it possible you're exceeding that and the disk itself is blocking? The cloudwatch metrics should be able to tell you that pretty quickly.

brandond avatar Aug 29 '22 16:08 brandond

@brandond Have checked with Write throughput (Ops/s) for the affected EBS volumes, noticed it's exceeding the 150 IOPS and dropping, and if not it's almost on the edge of guranteed IOPS value. This could be the major reason causing this. What should we consider next as this does not allow to use custom IOPS with the type "gp2".

Screenshot 2022-08-30 at 12 11 31 PM

gauravkcldcvr avatar Aug 30 '22 06:08 gauravkcldcvr

Your infrastructure provider has placed an upper bound on disk IOPS and you're exceeding it. I am not sure that there is anything to be done on the K3s side; you need to ensure that your storage is sufficient for the workload you're putting on it.

I would probably either change to one of the Provisioned IOPS volume types, or (if you are stuck on General Purpose volumes) over-provision the volume size such that the allocated IOPS meets your required throughput. See: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/general-purpose.html#gp2-performance

brandond avatar Aug 30 '22 19:08 brandond

I'm testing this currently by modifying the volume to "gp3" from "gp2" to have higher IOPS (ex: 3000) allocated and observing the behavior for the same.

gauravkcldcvr avatar Sep 01 '22 11:09 gauravkcldcvr

@brandond I have taken IOPS as a sum of Write throughput from the metrics per duration as mentioned in above screenshot, is that correct way of calculating that?

gauravkcldcvr avatar Sep 01 '22 12:09 gauravkcldcvr

Total disk IOPS would be the sum of all read and write ops. For disk types that support burstable iops, I think there's another metric for the volume credits that you might also look at. We're working at the edge of my knowledge of EC2 though; you might need to consult AWS support for better advice.

brandond avatar Sep 01 '22 17:09 brandond