infinispan-operator icon indicating copy to clipboard operation
infinispan-operator copied to clipboard

Adding XSITE to the existing cluster breaks the cluster

Open andrey-dubnik opened this issue 10 months ago • 5 comments

Hi,

We have a set of clusters running on v14 and v15. If we are adding XSITE to the new cluster it works fine where if we are adding XSITE to the existing cluster without one it does not work (it also breaks the server) as StatefulSet does not get updated with a new volume/secret configuration but the ISPN config file does gets updated and as a result cluster can't start.

error

08:52:39,658 FATAL (main) [org.infinispan.SERVER] ISPN080028: Infinispan Server failed to start org.infinispan.commons.CacheConfigurationException: /etc/encrypt/transport-site-tls/keystore.p12 (No such file or directory)

Cause of the error

STS/POD lacking XSITE volume mounts in the config

    - mountPath: /etc/encrypt/transport-site-tls
      name: encrypt-transport-site-tls-volume
    - mountPath: /etc/encrypt/truststore-site-tls
      name: encrypt-truststore-site-tls-volume

and

  - name: encrypt-transport-site-tls-volume
    secret:
      defaultMode: 420
      secretName: xsite-keystore
  - name: encrypt-truststore-site-tls-volume
    secret:
      defaultMode: 420
      secretName: xsite-truststore

andrey-dubnik avatar Apr 02 '24 08:04 andrey-dubnik

also operator logs indicate that it does detect and update the xsite for the cluster

2024-04-02T08:51:25.504Z	INFO	controllers.Infinispan	Found deployments with status 	{"infinispan": "distributed-cache-tenant-dev/accelerextestapp-dev-cert-testing-north", "starting": ["accelerextestapp-dev-cert-testing-north-1"], "ready": ["accelerextestapp-dev-cert-testing-north-0"]}
2024-04-02T08:51:26.401Z	INFO	controllers.Infinispan	Cluster not well-formed, retrying ...	{"infinispan": "distributed-cache-tenant-dev/accelerextestapp-dev-cert-testing-north"}
2024-04-02T08:51:26.401Z	INFO	controllers.Infinispan	podList.Items=2, i.Spec.Replicas=2	{"infinispan": "distributed-cache-tenant-dev/accelerextestapp-dev-cert-testing-north"}
2024-04-02T08:51:26.401Z	INFO	controllers.Infinispan	Done	{"infinispan": "distributed-cache-tenant-dev/accelerextestapp-dev-cert-testing-north", "requeue": true, "requeueAfter": "15s", "error": null}
2024-04-02T08:51:41.402Z	INFO	controllers.Infinispan	x-site configured	{"infinispan": "distributed-cache-tenant-dev/accelerextestapp-dev-cert-testing-north", "configuration": {"GossipRouter":{"Keystore":null,"Truststore":null},"MaxRelayNodes":2,"Sites":[{"Address":"accelerextestapp-dev-cert-testing-north-site","Name":"dev-northeurope-01","Port":65534,"IgnoreGossipRouter":false},{"Address":"xsite-accelerextestapp-dev-cert-testing-west.cache.maersk.io","Name":"dev-westeurope-01","Port":65534,"IgnoreGossipRouter":false}],"HeartbeatEnabled":true,"HeartbeatInterval":10000,"HeartbeatTimeout":30000}}
2024-04-02T08:51:41.402Z	INFO	controllers.Infinispan.xsite	Transport TLS Configured.	{"infinispan": "distributed-cache-tenant-dev/accelerextestapp-dev-cert-testing-north", "Keystore": "keystore.p12", "Secret Name": "xsite-keystore"}
2024-04-02T08:51:41.402Z	INFO	controllers.Infinispan.xsite	Found Truststore.	{"infinispan": "distributed-cache-tenant-dev/accelerextestapp-dev-cert-testing-north", "Truststore": "truststore.p12", "Secret Name": "xsite-truststore"}
2024-04-02T08:51:41.402Z	INFO	controllers.Infinispan.GossipRouter	TLS Configured.	{"infinispan": "distributed-cache-tenant-dev/accelerextestapp-dev-cert-testing-north", "Keystore": "keystore.p12", "Secret Name": "xsite-keystore"}
2024-04-02T08:51:41.481Z	INFO	controllers.Infinispan.GossipRouter	Cross-site deployment 'accelerextestapp-dev-cert-testing-north-router' updated	{"infinispan": "distributed-cache-tenant-dev/accelerextestapp-dev-cert-testing-north"}
2024-04-02T08:51:41.483Z	INFO	controllers.Infinispan	Found deployments with status 	{"infinispan": "distributed-cache-tenant-dev/accelerextestapp-dev-cert-testing-north", "starting": ["accelerextestapp-dev-cert-testing-north-1"], "ready": ["accelerextestapp-dev-cert-testing-north-0"]}

andrey-dubnik avatar Apr 02 '24 09:04 andrey-dubnik

here is also a snippet of the configuration

  service:
    type: DataGrid
    container:
      storage: 1Gi
    sites:  
      local:
        name: dev-westeurope-01
        discovery:
          launchGossipRouter: true
          memory: "2Gi:1Gi"
          cpu: "2000m:1000m"
        expose:
          type: LoadBalancer
          port: 65534
        maxRelayNodes: 2
        encryption:
          protocol: TLSv1.2
          transportKeyStore:
            secretName: xsite-keystore
            alias: xsite
            filename: keystore.p12
          routerKeyStore:
            secretName: xsite-keystore
            alias: xsite
            filename: keystore.p12
          trustStore:
            secretName: xsite-truststore
            filename: truststore.p12

andrey-dubnik avatar Apr 02 '24 12:04 andrey-dubnik

Thanks for raising the issue @andrey-dubnik, we'll try to take a look at this soon.

ryanemerson avatar Apr 03 '24 08:04 ryanemerson

Did you got a chance to see what may be wrong with it?

andrey-dubnik avatar May 10 '24 18:05 andrey-dubnik

@andrey-dubnik This issue has been added to our backlog, but no progress has been made yet. We'll make sure to reference this issue when a PR is raised.

ryanemerson avatar May 14 '24 14:05 ryanemerson