Mangle 3.0 Stability - Cassandra DB goes down suddenly
Environment: OpenShift v4.6.36 Kubernetes Version: v1.19.0 Mangle Version: 3.0 Issue:
- Cassandra DB goes down with failed connections causing mangle POD to do multiple retries on the cassandra DB
- Mangle product UI is not available for this entire duration
Interim Solution Being Followed:
- Restart Cassandra POD
- Restart mangle POD
- Increase the resource limits on cassandra statefulset template as recommended by the mangle team during working session.
Previous:
- resources:
limits:
cpu: '1'
memory: 8Gi
requests:
cpu: '500m'
memory: 2Gi
Current:
- resources:
limits:
cpu: '2'
memory: 8Gi
requests:
cpu: '1'
memory: 4Gi
Frequency Of This Issue: Once every few weeks. Typically 7-8 weeks but it may be random too.
Logs:
- Please find the attached logs from mangle & cassandra POD's when this issue downtime happened recently in the last week of February, 2022
cassandra_pod_failure_0227.txt mangle_pod_failure_0227.txt
Deployment Templates:
- Please find the attached cassandra statefulset & mangle deployment template resource cassandra_statefulset_template.txt mangle_deployment_template.txt
Hi @Anvesh42 Let us know on the stability of the cassandra pod after increasing the resource limits.
@rpraveen-vmware I have increased the resources on the Cassandra configuration as discussed during our session. I shall monitor it for few days and observe the stability. Thanks!
@ashrimalivmware @rpraveen-vmware Even after increasing the resource limits (as stated above), the cassandra POD still goes down. Attaching the latest log cassandra_04182022.txt .
@Anvesh42 What is the frequency of cassandra pod going down now with the increased resource limits..? cc: @ashrimalivmware
@ashrimalivmware @rpraveen-vmware Can you please share the docker files for mangle & Cassandra that were used to build these standard images?
In regards to Cassandra POD stability, I am looking at options to explore/enhance the possible solution for this.
Thanks Anvesh
@ashrimalivmware @rpraveen-vmware
In continuation to previous query in the same thread, we would like get some insights into the modifications that we can do to prevent cassandra POD from going down frequently. Please let us know. Details provided below.
Cassandra POD resources & ENV values:

Latest Cassandra Failure Log:
We also observe that the standard cassandra.yaml provided by Vmware doesn't have liveness probe. Could that be one the reasons?
Thanks Anvesh