cloud-on-k8s icon indicating copy to clipboard operation
cloud-on-k8s copied to clipboard

Support certificate chains when using custom certificates on the transport layer

Open prashant-warrier-echelonvi opened this issue 9 months ago • 3 comments

Operator Version

2.16.1

K8s Cluster Details

version: 1.30
distribution: Amazon EKS

Facts

The ECK operator logs this error on trying to create an Elasticsearch resource where the TLS certificates are obtained by cert-manager from Let's Encrypt:

only expected one PEM formated CA certificate in <namespace>/<secret-name>

The relevant log event looks like so:


{
  "log.level": "error",
  "@timestamp": "2025-03-25T10:20:43.840Z",
  "log.logger": "manager.eck-operator",
  "message": "Reconciler error",
  "service.version": "2.16.1+1f74bdd9",
  "service.type": "eck",
  "ecs.version": "1.4.0",
  "controller": "elasticsearch-controller",
  "object": {
    "name": "eck-qs",
    "namespace": "elasticsearch-clusters"
  },
  "namespace": "elasticsearch-clusters",
  "name": "eck-qs",
  "reconcileID": "ac688d7a-3448-4fae-87cd-5b6ae5a16e8d",
  "error": "only expected one PEM formated CA certificate in elasticsearch-clusters/eck-qs-tls",
  "errorCauses": [
    {
      "error": "only expected one PEM formated CA certificate in elasticsearch-clusters/eck-qs-tls",
      "errorVerbose": "only expected one PEM formated CA certificate in elasticsearch-clusters/eck-qs-tls\ngithub.com/elastic/cloud-on-k8s/v2/pkg/controller/common/certificates.parseCAFromSecret\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/common/certificates/ca_secret.go:56\ngithub.com/elastic/cloud-on-k8s/v2/pkg/controller/common/certificates.ParseCustomCASecret\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/common/certificates/ca_secret.go:32\ngithub.com/elastic/cloud-on-k8s/v2/pkg/controller/elasticsearch/certificates/transport.ReconcileOrRetrieveCA\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/certificates/transport/ca.go:77\ngithub.com/elastic/cloud-on-k8s/v2/pkg/controller/elasticsearch/certificates.ReconcileTransport\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/certificates/reconcile.go:112\ngithub.com/elastic/cloud-on-k8s/v2/pkg/controller/elasticsearch/driver.(*defaultDriver).Reconcile\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver/driver.go:234\ngithub.com/elastic/cloud-on-k8s/v2/pkg/controller/elasticsearch.(*ReconcileElasticsearch).internalReconcile\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/elasticsearch_controller.go:298\ngithub.com/elastic/cloud-on-k8s/v2/pkg/controller/elasticsearch.(*ReconcileElasticsearch).Reconcile\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/elasticsearch_controller.go:186\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/root/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:116\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/root/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/root/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/root/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:224\nruntime.goexit\n\t/usr/lib/go/src/runtime/asm_amd64.s:1700"
    }
  ],
  "error.stack_trace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/root/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/root/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/root/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:224"
}

Impact

This prevents Elasticsearch from being deployed when using TLS certificates issued by cert-manager with Let's Encrypt.

Per Elastic's documentation, tls.crt can contain a certificate chain. However, the ECK operator enforces a stricter requirement, rejecting secrets with more than one PEM-formatted certificate.

Details

We're trying to create an Elasticsearch resource with the API kind: elasticsearch.k8s.elastic.co/v1 with the following spec:

auth:
  disableElasticUser: true
  fileRealm:
    - secretName: quickstart-file-realm-users
http:
  service:
    metadata: {}
    spec: {}
  tls:
    certificate:
      secretName: eck-qs-tls
    selfSignedCertificate:
      disabled: true
monitoring:
  logs: {}
  metrics: {}
nodeSets:
  - config:
      node.store.allow_mmap: false
    count: 3
    name: default
remoteClusterServer: {}
transport:
  service:
    metadata: {}
    spec: {}
  tls:
    certificate:
      secretName: eck-qs-tls
    certificateAuthorities: {}
    selfSignedCertificates:
      disabled: true
updateStrategy:
  changeBudget: {}
version: 8.17.3

The certificate secret being referred to in the spec above is generated by a Certificate resource controlled by cert-manager, and the certificate is issued by Let's Encrypt.

dnsNames:
  - <our ES's DNS>
duration: 2160h0m0s
issuerRef:
  kind: ClusterIssuer
  name: letsencrypt
privateKey:
  algorithm: RSA
  encoding: PKCS1
  size: 2048
renewBefore: 360h0m0s
secretName: eck-qs-tls
subject:

We're able to verify that the relevant secret gets created, and has these two keys:

kubectl get secrets -n elasticsearch-clusters eck-qs-tls -o yaml | yq -r '.data | keys'
- tls.crt
- tls.key

tls.crt is a chain of certificates, and it looks like so:


-----BEGIN CERTIFICATE-----
<PEM>
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
<PEM>
-----END CERTIFICATE-----

The keys in this secret are per the requirements stated here.

Digging Around

On digging around, I found this go function:


func parseCAFromSecret(s corev1.Secret, keyFileName string, crtFileName string) (*CA, error) {
	// Validate private key
	key, exist := s.Data[keyFileName]
	if !exist {
		return nil, pkgerrors.Errorf("can't find private key %s in %s/%s", keyFileName, s.Namespace, s.Name)
	}
	privateKey, err := ParsePEMPrivateKey(key)
	if err != nil {
		return nil, pkgerrors.Wrapf(err, "can't parse private key %s in %s/%s", keyFileName, s.Namespace, s.Name)
	}
	// Validate CA certificate
	cert, exist := s.Data[crtFileName]
	if !exist {
		return nil, pkgerrors.Errorf("can't find certificate %s in %s/%s", crtFileName, s.Namespace, s.Name)
	}
	pubKeys, err := ParsePEMCerts(cert)
	if err != nil {
		return nil, pkgerrors.Wrapf(err, "can't parse CA certificate %s in %s/%s", crtFileName, s.Namespace, s.Name)
	}
	if len(pubKeys) != 1 {
		return nil, pkgerrors.Errorf("only expected one PEM formated CA certificate in %s/%s", s.Namespace, s.Name)
	}
	return NewCA(privateKey, pubKeys[0]), nil
}

This is the block that results in that error being logged:


	if len(pubKeys) != 1 {
		return nil, pkgerrors.Errorf("only expected one PEM formated CA certificate in %s/%s", s.Namespace, s.Name)
	}

I think this completely opposite to the documentation on this matter, which states that tls.crt can be a certificate or a chain.

I intially thought this was a bug but the problem stems from the fact that you are configuring the same TLS secret you use for the HTTP layer of Elasticsearch also for the transport layer:

transport:
  service:
    metadata: {}
    spec: {}
  tls:
    certificate:
      secretName: eck-qs-tls

Our documentation on the usage in this location says:

You can use a Kubernetes secret to provide your own CA instead of the self-signed certificate that ECK will then use to create node certificates for transport connections. The CA certificate must be stored in the secret under ca.crt and the private key must be stored under ca.key.

We currently do not support chains in the custom CAs that you can configure there. The API documentation you found for the TLSOptions applies only to the HTTP layer. The correct documentation for the transport layer is here

The "Appears in" is intended to help figuring out where which configuration applies:

Image

To fix your setup you would need to either remove the transport.tls.certificate section and allow ECK to use self-signed certificates on the transport layer (which should normally not matter to you or any of your ES clients) or configure a secret with ca.crt (single cert no chain) and ca.key

pebrc avatar Mar 26 '25 17:03 pebrc

We currently do not support chains in the custom CAs that you can configure there

Why does TransportTLSOptions not support chains? Is there a specific reason?

thooooooomas avatar May 29 '25 22:05 thooooooomas

We currently do not support chains in the custom CAs that you can configure there

Why does TransportTLSOptions not support chains? Is there a specific reason?

I don't think there is a specific reason other than the current limitations in the code. The transport layer cert handling was originally not designed to be user customisable so this restriction seemed acceptable at the time. We can look into loosening this restriction.

pebrc avatar May 30 '25 12:05 pebrc