pega-helm-charts icon indicating copy to clipboard operation
pega-helm-charts copied to clipboard

Persistence of GC logs and Heap dumps in case of system failures.

Open yashwanth-pega opened this issue 4 years ago • 4 comments

Requirement 1: Currently, garbage collection logs are frequently emitted to the mentioned location(say, /usr/local/tomcat/logs/) in local file system. In case of system failures or JVM crashes, these logs help us in diagnosing the issue from a Garbage collection viewpoint. However, when the pod crashes because of any such failures, the logs are lost. Requirement 2: When dealing with the system failures related to OutOfMemoryError(OOME) issues, Heapdump gives us precious insights into the issue. The collection of frequent heap dumps isn't practical. Fortunately, automated heap dump collection is done(in case of OOME) if we use the JVM setting -XX:+HeapDumpOnOutOfMemoryError. However, on the pod crash, the collected dumps are lost(same as the situation with GC logs).

Possible solutions:

  1. A mechanism to persist this data frequently(Apparently, an overkill)
  2. Write the data to persistent storage(say, s3 buckets) before the pod crashes.

JVM provides a way to execute a command/script before the JVM is down with OOME if we use the setting -XX:OnOutOfMemoryError=<SOME_COMMAND>. This gives us the chance(to execute a script) to persist the required data(GC logs and Heap dump), before the pod crash.

Note: Also consider how this problem is dealt with in Pega cloud systems.

yashwanth-pega avatar Jan 13 '21 10:01 yashwanth-pega

Refer to the respective JVM parameters in the following document: https://pegasystems.sharepoint.com/sites/ScalableExecutionEngineEaaS/SitePages/JVM-Flags-Analysis.aspx?web=1

yashwanth-pega avatar Jan 13 '21 10:01 yashwanth-pega

Note that because of SDEA images being dependent on jdk8, the JVM Argument for GC log collection, which is jdk9+, was reverted. Refer to the following : https://github.com/pegasystems/docker-pega-web-ready/pull/96

This (addition of the GC logs flag) needs to be addressed as part of this or a different issue, as the flag is the prerequisite for one of the stated requirements.

yashwanth-pega avatar Jan 18 '21 05:01 yashwanth-pega

Related issue: https://github.com/pegasystems/pega-helm-charts/issues/125

yashwanth-pega avatar Jan 18 '21 05:01 yashwanth-pega

@yashwanth-pega could you override the heap dump location via custom env variable (part of the tier definition and used here to write to a persistent volume?

APegaDavis avatar Jun 05 '23 18:06 APegaDavis

Fixed in #726

kishorv10 avatar Apr 23 '24 10:04 kishorv10