cloud-on-k8s icon indicating copy to clipboard operation
cloud-on-k8s copied to clipboard

Elastic operator renewing Elasticsearch internal certificates breaks Stack Monitoring

Open Ricardolaponder opened this issue 2 years ago • 4 comments

Bug Report

What did you do?

We have deployed Elasticsearch on Kubernetes with ECK. For monitoring we have deployed a monitoring cluster and use stack monitoring with beats to monitor our production cluster with the monitoring cluster. This works fine before ECK renewed the internal certificates the Elasticsearch cluster uses for internal communication.

What did you expect to see?

logs and metrics from the production cluster before and after the certificate change in stack monitoring in the Monitoring cluster.

What did you see instead? Under which circumstances?

Only logs from the production cluster after the certificate change, metricbeat stopped sending metrics.

Environment

  • ECK version: 1.9.0

  • Elasticsearch version: 7.16.3

  • Kubernetes information:

    • On premise
    • Kubernetes distribution: Rancher
    • Kubernetes version: v1.21.9
  • Logs:

Elasticsearch nodes gave this message: 

`[2022-03-07T12:47:39,404][WARN ][o.e.h.AbstractHttpServerTransport] [elasticsearch-es-master-2] caught exception while handling client http traffic, closing connection Netty4HttpChannel{localAddress=/127.0.0.1:9200, remoteAddress=/127.0.0.1:38980}
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Received fatal alert: bad_certificate`

Metricbeat gave this error:
`2022-03-07T15:33:38.709Z        ERROR   module/wrapper.go:259   Error fetching data for metricset elasticsearch.node_stats: error making http request: Get "https://localhost:9200/_nodes/_local/stats": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "elasticsearch-http")`

I saw that the new internal CA was mounted in the metricbeat container, but I had to restart the metricbeat container to fix this issue. Filebeats certificate verification mode is on [certificate](https://www.elastic.co/guide/en/beats/filebeat/current/configuration-ssl.html#client-verification-mode)

Ricardolaponder avatar Mar 09 '22 09:03 Ricardolaponder

Yes, I see the problem. The new certificates (from the monitored cluster and from the monitoring cluster) are well propagated in the Metricbeat container. Metricbeat uses a persistent connection so as long as the connection is established, it works, even if the certificate has expired. As soon as the connection is closed, Metricbeat tries to reconnect with the old certificate without considering the new certificate and got the PKI error x509: certificate signed by unknown authority.

Temporary workaround: kill the Beat process to recreate the Beat container (kubectl exec $esPod -c metricbeat -- kill 1).

thbkrkr avatar Mar 10 '22 13:03 thbkrkr

@thbkrkr Issue still happens on ECK 2.6.1 + Stack 8.8.1. It looks like https://github.com/elastic/beats/pull/34416 does not really help. What's our plan to fix this issue?

milanage avatar Sep 21 '23 05:09 milanage

This still happens on ECK 2.10 & Stack 8.12.0

VCCPlindsten avatar Feb 02 '24 15:02 VCCPlindsten

+1

KannappanSomu avatar Feb 02 '24 15:02 KannappanSomu