infinispan-operator
infinispan-operator copied to clipboard
Adding XSITE to the existing cluster breaks the cluster
Hi,
We have a set of clusters running on v14 and v15. If we are adding XSITE to the new cluster it works fine where if we are adding XSITE to the existing cluster without one it does not work (it also breaks the server) as StatefulSet does not get updated with a new volume/secret configuration but the ISPN config file does gets updated and as a result cluster can't start.
error
08:52:39,658 FATAL (main) [org.infinispan.SERVER] ISPN080028: Infinispan Server failed to start org.infinispan.commons.CacheConfigurationException: /etc/encrypt/transport-site-tls/keystore.p12 (No such file or directory)
Cause of the error
STS/POD lacking XSITE volume mounts in the config
- mountPath: /etc/encrypt/transport-site-tls
name: encrypt-transport-site-tls-volume
- mountPath: /etc/encrypt/truststore-site-tls
name: encrypt-truststore-site-tls-volume
and
- name: encrypt-transport-site-tls-volume
secret:
defaultMode: 420
secretName: xsite-keystore
- name: encrypt-truststore-site-tls-volume
secret:
defaultMode: 420
secretName: xsite-truststore
also operator logs indicate that it does detect and update the xsite for the cluster
2024-04-02T08:51:25.504Z INFO controllers.Infinispan Found deployments with status {"infinispan": "distributed-cache-tenant-dev/accelerextestapp-dev-cert-testing-north", "starting": ["accelerextestapp-dev-cert-testing-north-1"], "ready": ["accelerextestapp-dev-cert-testing-north-0"]}
2024-04-02T08:51:26.401Z INFO controllers.Infinispan Cluster not well-formed, retrying ... {"infinispan": "distributed-cache-tenant-dev/accelerextestapp-dev-cert-testing-north"}
2024-04-02T08:51:26.401Z INFO controllers.Infinispan podList.Items=2, i.Spec.Replicas=2 {"infinispan": "distributed-cache-tenant-dev/accelerextestapp-dev-cert-testing-north"}
2024-04-02T08:51:26.401Z INFO controllers.Infinispan Done {"infinispan": "distributed-cache-tenant-dev/accelerextestapp-dev-cert-testing-north", "requeue": true, "requeueAfter": "15s", "error": null}
2024-04-02T08:51:41.402Z INFO controllers.Infinispan x-site configured {"infinispan": "distributed-cache-tenant-dev/accelerextestapp-dev-cert-testing-north", "configuration": {"GossipRouter":{"Keystore":null,"Truststore":null},"MaxRelayNodes":2,"Sites":[{"Address":"accelerextestapp-dev-cert-testing-north-site","Name":"dev-northeurope-01","Port":65534,"IgnoreGossipRouter":false},{"Address":"xsite-accelerextestapp-dev-cert-testing-west.cache.maersk.io","Name":"dev-westeurope-01","Port":65534,"IgnoreGossipRouter":false}],"HeartbeatEnabled":true,"HeartbeatInterval":10000,"HeartbeatTimeout":30000}}
2024-04-02T08:51:41.402Z INFO controllers.Infinispan.xsite Transport TLS Configured. {"infinispan": "distributed-cache-tenant-dev/accelerextestapp-dev-cert-testing-north", "Keystore": "keystore.p12", "Secret Name": "xsite-keystore"}
2024-04-02T08:51:41.402Z INFO controllers.Infinispan.xsite Found Truststore. {"infinispan": "distributed-cache-tenant-dev/accelerextestapp-dev-cert-testing-north", "Truststore": "truststore.p12", "Secret Name": "xsite-truststore"}
2024-04-02T08:51:41.402Z INFO controllers.Infinispan.GossipRouter TLS Configured. {"infinispan": "distributed-cache-tenant-dev/accelerextestapp-dev-cert-testing-north", "Keystore": "keystore.p12", "Secret Name": "xsite-keystore"}
2024-04-02T08:51:41.481Z INFO controllers.Infinispan.GossipRouter Cross-site deployment 'accelerextestapp-dev-cert-testing-north-router' updated {"infinispan": "distributed-cache-tenant-dev/accelerextestapp-dev-cert-testing-north"}
2024-04-02T08:51:41.483Z INFO controllers.Infinispan Found deployments with status {"infinispan": "distributed-cache-tenant-dev/accelerextestapp-dev-cert-testing-north", "starting": ["accelerextestapp-dev-cert-testing-north-1"], "ready": ["accelerextestapp-dev-cert-testing-north-0"]}
here is also a snippet of the configuration
service:
type: DataGrid
container:
storage: 1Gi
sites:
local:
name: dev-westeurope-01
discovery:
launchGossipRouter: true
memory: "2Gi:1Gi"
cpu: "2000m:1000m"
expose:
type: LoadBalancer
port: 65534
maxRelayNodes: 2
encryption:
protocol: TLSv1.2
transportKeyStore:
secretName: xsite-keystore
alias: xsite
filename: keystore.p12
routerKeyStore:
secretName: xsite-keystore
alias: xsite
filename: keystore.p12
trustStore:
secretName: xsite-truststore
filename: truststore.p12
Thanks for raising the issue @andrey-dubnik, we'll try to take a look at this soon.
Did you got a chance to see what may be wrong with it?
@andrey-dubnik This issue has been added to our backlog, but no progress has been made yet. We'll make sure to reference this issue when a PR is raised.