kine icon indicating copy to clipboard operation
kine copied to clipboard

CockroachDB causes problems with K3s using Postgres driver

Open mbrancato opened this issue 3 years ago • 4 comments

I'm able to connect to CockroachDB with K3s and Kine, however K3s will not work with CockroachDB. I'm not sure what I can provide here beside output.

I do get a lot of RBAC errors like this:

heduler" cannot list resource "storageclasses" in API group "storage.k8s.io" at the cluster scope
Jul 11 10:15:39 virt-0 k3s[14160]: E0711 10:15:39.312697   14160 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.StatefulSet: statefulsets.apps is forbidden: User "system:kube-scheduler" cannot list resource "statefulsets" in API group "apps" at the cluster scope
Jul 11 10:15:39 virt-0 k3s[14160]: E0711 10:15:39.312887   14160 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Node: nodes is forbidden: User "system:kube-scheduler" cannot list resource "nodes" in API group "" at the cluster scope
Jul 11 10:15:39 virt-0 k3s[14160]: E0711 10:15:39.325938   14160 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.PersistentVolume: persistentvolumes is forbidden: User "system:kube-scheduler" cannot list resource "persistentvolumes" in API group "" at the cluster scope
Jul 11 10:15:39 virt-0 k3s[14160]: E0711 10:15:39.333258   14160 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.ReplicaSet: replicasets.apps is forbidden: User "system:kube-scheduler" cannot list resource "replicasets" in API group "apps" at the cluster scope
Jul 11 10:15:39 virt-0 k3s[14160]: E0711 10:15:39.342171   14160 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.CSINode: csinodes.storage.k8s.io is forbidden: User "system:kube-scheduler" cannot list resource "csinodes" in API group "storage.k8s.io" at the cluster scope
Jul 11 10:15:39 virt-0 k3s[14160]: E0711 10:15:39.356777   14160 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.Service: services is forbidden: User "system:kube-scheduler" cannot list resource "services" in API group "" at the cluster scope
Jul 11 10:15:39 virt-0 k3s[14160]: E0711 10:15:39.362120   14160 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:kube-scheduler" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
Jul 11 10:15:39 virt-0 k3s[14160]: E0711 10:15:39.367089   14160 reflector.go:153] k8s.io/kubernetes/cmd/kube-scheduler/app/server.go:246: Failed to list *v1.Pod: pods is forbidden: User "system:kube-scheduler" cannot list resource "pods" in API group "" at the cluster scope
Jul 11 10:15:39 virt-0 k3s[14160]: E0711 10:15:39.387236   14160 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.PersistentVolumeClaim: persistentvolumeclaims is forbidden: User "system:kube-scheduler" cannot list resource "persistentvolumeclaims" in API group "" at the cluster scope
Jul 11 10:15:39 virt-0 k3s[14160]: E0711 10:15:39.391674   14160 reflector.go:153] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.ReplicationController: replicationcontrollers is forbidden: User "system:kube-scheduler" cannot list resource "replicationcontrollers" in API group "" at the cluster scope

After a while, I get some pq errors:

Jul 11 10:16:13 virt-0 k3s[14160]: time="2020-07-11T10:16:13.396427502-04:00" level=error msg="error while range on /registry/deployments /registry/deployments: pq: internal error while retrieving user account"
Jul 11 10:16:17 virt-0 k3s[14160]: time="2020-07-11T10:16:17.834086756-04:00" level=error msg="error while range on /registry/configmaps/kube-system/k3s : pq: internal error while retrieving user account"
Jul 11 10:16:17 virt-0 k3s[14160]: time="2020-07-11T10:16:17.834596643-04:00" level=error msg="error while range on /registry/ranges/servicenodeports : pq: internal error while retrieving user account"
Jul 11 10:16:17 virt-0 k3s[14160]: time="2020-07-11T10:16:17.834881953-04:00" level=error msg="error while range on /registry/namespaces/default : pq: internal error while retrieving user account"
Jul 11 10:16:17 virt-0 k3s[14160]: E0711 10:16:17.836634   14160 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"pq: internal error while retrieving user account"
, Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jul 11 10:16:17 virt-0 k3s[14160]: time="2020-07-11T10:16:17.835201049-04:00" level=error msg="error while range on /registry/ranges/serviceips : pq: internal error while retrieving user account"
Jul 11 10:16:17 virt-0 k3s[14160]: E0711 10:16:17.838544   14160 repair.go:100] unable to refresh the service IP block: rpc error: code = Unknown desc = pq: internal error while retrieving user account
Jul 11 10:16:17 virt-0 k3s[14160]: E0711 10:16:17.837511   14160 repair.go:73] unable to refresh the port allocations: rpc error: code = Unknown desc = pq: internal error while retrieving user account
Jul 11 10:16:17 virt-0 k3s[14160]: E0711 10:16:17.838010   14160 status.go:71] apiserver received an error that is not an metav1.Status: &status.statusError{Code:2, Message:"pq: internal error while retrieving user account"
, Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
Jul 11 10:16:17 virt-0 k3s[14160]: E0711 10:16:17.840814   14160 leaderelection.go:331] error retrieving resource lock kube-system/k3s: rpc error: code = Unknown desc = pq: internal error while retrieving user account

Additionally, using certificate auth, K3s will eventually restart waiting for some CRDs to complete, I'm not sure that is specific to CockroachDB though.

mbrancato avatar Jul 11 '20 14:07 mbrancato

I had an issue running CockroachDB as well that if memory serves I had similar errors. The problem I had was related to the ID field, where the SERIAL type being specified is only a pseudo data type provided for compatibility with postgres in crdb (rather than sequential it ends up being more of a UUID). I think what I ended up doing was adding this to the connection string:

experimental_serial_normalization=sql_sequence

Some more details here: https://www.cockroachlabs.com/docs/stable/serial.html https://www.cockroachlabs.com/docs/stable/experimental-features.html#session-variables

ChrisRx avatar Sep 01 '20 16:09 ChrisRx

FYI, the current way of setting sql_sequence is SET CLUSTER SETTING sql.defaults.serial_normalization=2; using admin account and it is a cluster-wide setting.

dotkrnl avatar Feb 08 '21 04:02 dotkrnl

This setting of sql.defaults.serial_normalization works perfectly and now I have an up and running k3s cluster. It solved my issue as mentioned in k3s-io/k3s#2613 although the phenomenon is a little bit different from that in this issue (I do not have any pg internal errors). The documentation is now at https://www.cockroachlabs.com/docs/v20.2/cluster-settings.html .

To close this issue, you would probably want to add this to k3s or kine documentation as it is no longer an experimental feature for CockroachDB anymore. Or you may want to support non-sequential ids for CockroachDB as a connection driver. CockroachDB can be a great option for the HA database for k3s IMHO.

dotkrnl avatar Feb 08 '21 04:02 dotkrnl

We don't have a CockroachDB specific driver. Technically it's not supported - we've only tested with postgresql itself, not any of the various projects that offer a compatible interface.

brandond avatar Feb 08 '21 09:02 brandond