etcd-operator
etcd-operator copied to clipboard
EtcdBackup unable to create etcd endpoint.
I am running a working etcd-cluster with vault on it. The vault is working correctly and so is the cluster itself.
Name: vault-etcd
Namespace: my-namespace
Labels: app=vault
vault_cr=vault
Annotations: <none>
API Version: etcd.database.coreos.com/v1beta2
Kind: EtcdCluster
Metadata:
Cluster Name:
Creation Timestamp: 2018-08-29T14:04:48Z
Generation: 0
Owner References:
API Version: vault.banzaicloud.com/v1alpha1
Controller: true
Kind: Vault
Name: vault
UID: 7c63314b-ab94-11e8-bd5c-0626c6bac6fc
Resource Version: 4246480
Self Link: /apis/etcd.database.coreos.com/v1beta2/namespaces/my-namespace/etcdclusters/vault-etcd
UID: 7d891555-ab94-11e8-bd5c-0626c6bac6fc
Spec:
TLS:
Static:
Member:
Peer Secret: vault-etcd-tls
Server Secret: vault-etcd-tls
Operator Secret: vault-etcd-tls
Repository: quay.io/coreos/etcd
Size: 3
Version: 3.1.15
Status:
Client Port: 2379
Conditions:
Last Transition Time: 2018-08-29T14:05:31Z
Last Update Time: 2018-08-29T14:05:31Z
Reason: Cluster available
Status: True
Type: Available
Current Version: 3.1.15
Members:
Ready:
vault-etcd-475x979hr9
vault-etcd-fqsvhxrhl4
vault-etcd-lrdzr5gsqn
Phase: Running
Service Name: vault-etcd-client
Size: 3
Target Version:
Events: <none>
I have created an EtcdBackup to make a backup to an S3 bucket, but it keeps failing. And i can't find out why. KubeDNS is working and the endpoint is correct.
Name: backup-vault-etcd-20180828-1045
Namespace: my-namespace
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"etcd.database.coreos.com/v1beta2","kind":"EtcdBackup","metadata":{"annotations":{},"name":"backup-vault-etcd-20180828-1045","namespace":...
API Version: etcd.database.coreos.com/v1beta2
Kind: EtcdBackup
Metadata:
Cluster Name:
Creation Timestamp: 2018-08-30T09:27:39Z
Generation: 0
Resource Version: 4244170
Self Link: /apis/etcd.database.coreos.com/v1beta2/namespaces/my-namespace/etcdbackups/backup-vault-etcd-20180828-1045
UID: f071b973-ac36-11e8-b2fa-0626c6bac6fc
Spec:
Client TLS Secret: vault-etcd-tls
Etcd Endpoints:
https://vault-etcd-client:2379
S 3:
Aws Secret: etcd-operator
Path: etcd-backups/vault-etcd-20180828-1045
Storage Type: S3
Status:
Reason: failed to save snapshot (create etcd client failed: failed to get etcd client with maximum kv store revision: could not create an etcd client for the max revision purpose from given endpoints ([https://vault-etcd-client:2379]))
Succeeded: false
Events: <none>
The k8s secret vault-etcd-tls contains everything needed.
Name: vault-etcd-tls
Namespace: my-namespace
Labels: app=vault
vault_cr=vault-etcd
Annotations: <none>
Type: Opaque
Data
====
peer.crt: 1342 bytes
peer.key: 1675 bytes
server.crt: 1330 bytes
server.key: 1679 bytes
etcd-client.crt: 1131 bytes
peer-ca.crt: 1143 bytes
server-ca.crt: 1143 bytes
etcd-client-ca.crt: 1143 bytes
etcd-client.key: 1679 bytes
I think the fact the backup-operator keeps failing is a bug, because i can't find a configuration mistake.
Looking true the error logs and then at the code i have found at least an optimization for the error logging.
if maxClient == nil {
return nil, 0, fmt.Errorf("could not create an etcd client for the max revision purpose from given endpoints (%v)", endpoints)
}
var err error
if len(errors) > 0 {
errorStr := ""
for _, errStr := range errors {
errorStr += errStr + "\n"
}
err = fmt.Errorf(errorStr)
}
This should be changed because the specific error, failed to create or failed to revision are not printed to the log when no Client could be created. I think the error logging should be moved before the maxClient == nil
check.
So i don't know if it fails to to get revision from endpoint or failt to create a etcd-client.
I still have this problem. Any thoughts?
I also have same issue:
time="2018-09-12T13:12:45Z" level=error msg="error syncing etcd backup (vault/etcd-cluster): failed to save snapshot (create etcd client failed: failed to get etcd client with maximum kv store revision: could not create an etcd client for the max revision purpose from given endpoints ([https://etcd-cluster-client.vault:2379]))" pkg=controller
@shebanian in my case I could solve the issue by using .svc in the etcd cluster endpoint url:
This one generates the fault: https://etcd-cluster-client.vault:2379
But when changing the endpoint to https://etcd-cluster-client.vault.svc:2379 the backup is successfully save.
@salkin I have tried you're solution but it doesn't help. I'm still keep getting the same error.
My 2 cents, seems to work on the clusters created in the NS by etcd-operator, doesn't seem to work on my external clusters, I gave my backupCR the endpoints, the AWS stuff, but it still won't get the backup for me. Though debugging is much harder simply because I don't know what is actually failing, keep getting that error in my logs too.
Side note: can anyone show me how to use etcd-backup-operator to backup external clusters? tyvm
Hi, I have also faced the similar issue. It seems you have problem to connect the etcd. You need to configure the etcd certificate with the same name what they have suggested.
Hi Guys, I've found this issue also and the solution to fix this issue is add
ClientTLSSecret: vault-cluster-etcd-client-tls
The ClientTLSSecret value must be exactly match with secret name as show in kubectl get secret
apiVersion: "etcd.database.coreos.com/v1beta2" kind: "EtcdBackup" metadata: name: etcd-cluster-backup spec: etcdEndpoints: ["https://vault-cluster-etcd-client:2379"] ClientTLSSecret: vault-cluster-etcd-client-tls storageType: S3 s3: path: vault-backup-bucket/TH/MGT/Openshift/non-production-cluster-1.bkp awsSecret: aws
Enjoy!
I have this error also, without using SSL (for now):
"Reason": "failed to save snapshot (create etcd client failed: failed to get etcd client with maximum kv store revision: could not create an etcd client for the max revision purpose from given endpoints ([http://vault-etcd-cluster-client.secrets:2379]))",
"etcdRevision": 2822,
"etcdVersion": "3.3.12",
"lastSuccessDate": "2019-03-21T22:46:39Z",
"succeeded": false
But I am still getting backup files in the bucket and I see the following log in the operator console:
amazing-dog-etcd-operator-etcd-backup-operator-5c5fbdbcb8-968zr etcd-backup-operator 2019-03-21T22:46:39.208425881Z time="2019-03-21T22:46:39Z" level=info msg="getMaxRev: endpoint http://vault-etcd-cluster-client.secrets:2379 revision (2822)"
Seeing the same issue with backup operator 0.9.4:
- no TLS
- operator and cluster reside in separate namespaces
- backupCR is configured in operator's namespaces and connects to service fqdn http://test.etcd.svc.cluster.local:2379
seconds after the following log line is in the logs:
time="2019-05-02T08:56:54Z" level=info msg="getMaxRev: endpoint http://test-client.etcd.svc.cluster.local:2379 revision (4)"
which indicates that connection was successful, I can see for a brief moment that backup was successful (also S3 bucket gets updated). After couple of seconds (significantly less then backup interval) though backup status is changed to failed with the following reason:
failed to save snapshot (create etcd client failed: failed to get etcd client with maximum kv store revision: could not create an etcd client for the max revision purpose from given endpoints ([http://test-client.etcd.svc.cluster.local:2379]))
Restart of the backup operator pod fixes the issue.
The steps to reproduce this behaviour are seem to be these:
- Configure wrong s3 credentials and wait for backup to fail
- Restore correct configuration and use something like
watch -n 1 -d kubectl describe etcdbackup
to observe faulty behaviour. - Restart backup-operator pod with
kubectl delete pod
and watch issue disappear
how to use etcd-backup-operator to backup external clusters
Did you fix it? I have the same scene as you.
I can confirm that @selfieblue 's solution works, although the name he provided is wrong.
Use this etcd-backup.yml
as a reference:
apiVersion: "etcd.database.coreos.com/v1beta2"
kind: "EtcdBackup"
metadata:
name: gcs-vault-backup
spec:
etcdEndpoints:
- https://vault-etcd-client:2379
clientTLSSecret: vault-etcd-client-tls
storageType: GCS
backupPolicy:
backupIntervalInSecond: 3600
maxBackups: 48
gcs:
path: my-bucket-name/vault.backup
gcpSecret: gcs-vault-credentials
Adding
clientTLSSecret: vault-etcd-client-tls
fixes the issue. Make sure to use the tls
secret of your etcd client (kubectl get secrets | grep client-tls
) and try again. A successful backup will result in the following:
$ kubectl logs -f deployments/etcd-operator etcd-backup-operator
time="2019-10-10T07:20:31Z" level=info msg="getMaxRev: endpoint https://vault-etcd-client:2379 revision (1978)"
Note that if you've done everything correctly you should see the Client TLS Secret
reference appearing when you execute this command:
$ kubectl describe etcdbackups.etcd.database.coreos.com gcs-vault-backup
Name: gcs-vault-backup
Namespace: default
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"etcd.database.coreos.com/v1beta2","kind":"EtcdBackup","metadata":{"annotations":{},"name":"gcs-vault-backup","namespa...
API Version: etcd.database.coreos.com/v1beta2
Kind: EtcdBackup
Metadata:
Creation Timestamp: 2019-10-10T07:19:02Z
Finalizers:
backup-operator-periodic
Generation: 33
Resource Version: 189086934
Self Link: /apis/etcd.database.coreos.com/v1beta2/namespaces/default/etcdbackups/gcs-vault-backup
UID: 158ff07d-86e2-4cfa-b7ac-618d25662cf7
Spec:
Backup Policy:
Backup Interval In Second: 3600
Max Backups: 48
Client TLS Secret: vault-etcd-client-tls
Etcd Endpoints:
https://vault-etcd-client:2379
Gcs:
Gcp Secret: gcs-vault-credentials
Path: my-bucket-name/vault.backup
Storage Type: GCS
i was also able to remedy this issue by following @alex-goncharov 's recommendation. all my configs were correct, though i had been monkeying around with the aws secrets for a bit before getting them right. with everything back in alignment, it still kept failing, but turns out i needed to delete-restart the etcd-operator-etcd-backup pod (which spawns a new one via the deployment) and then delete and recreate the etcdbackup custom resource. doing those two things was all i needed to get it working again. thanks for the pro tip alex.