cryostat-legacy icon indicating copy to clipboard operation
cryostat-legacy copied to clipboard

Gracefully stop recording on the target's pod termination

Open zhelanov opened this issue 2 years ago • 3 comments

One of the main cases is to see what's going on with your app in its last moments, but currently, I can't do that

How to reproduce:

  1. In the Cryostat web interface, add your target JVM
  2. Run any type of a recording (continuous or not)
  3. Restart your app's deployment/daemonset/etc. in the K8S cluster
  4. Once the old pod is terminated, check the recording status

Expected result: The recording is stopped and available to be downloaded

Actual result: The recording is disappeared

Environment: AWS EKS 1.21.5 Cryostat f1810ea7 (installed via Helm chart)

zhelanov avatar Jun 29 '22 11:06 zhelanov

This is something difficult to handle, unfortunately. When k8s shuts down the application pod the JVM is simply killed, and there is not much time or any warning for Cryostat to open a connection and copy the JFR data out of the JVM before this happens.

The best-fit workflow for this use case is to use an Automated Rule that matches your target application and set this with a suitably short archival period. For example, if you set this to 10 seconds, then Cryostat will start a recording when your application appears, and every 10 seconds will copy the JFR data into Cryostat's archives. You can also configure a maximum number of archived copies to retain so that this doesn't grow unbounded. This way, if your application crashes or is terminated, you can see what it was doing up to its last <10 seconds.

andrewazores avatar Jun 29 '22 11:06 andrewazores

Got your point, thanks!

May it be solved somehow by mounting a shared volume to a cryostat pod and a pod with target JVM? I understand that it leads to more complexity for sure, but the case must be quite common

zhelanov avatar Jun 30 '22 08:06 zhelanov

Yes, there are possible setups where you mount a shared volume to your target JVM pods and configure the target JVM with ex. -XX:FlightRecorderOptions=defaultrecording=true,dumponexit=true,dumponexitpath=path, where the path is a directory on the mounted volume. This way the target JVMs will write a JFR file with the datetime on exit - assuming k8s gives them a SIGQUIT/SIGTERM to allow for graceful termination, and not just a SIGKILL.

You would also be missing something to organize which target JVM dumped the file into the shared volume with this scheme, I think. Perhaps you would use different subdirectories within the volume for different services.

Anyway, it's a good point and a use case we should explore with Cryostat in the future. Perhaps we can add some configuration to look in such a shared directory and include it within Cryostat's archives so that these dumped files can also be analyzed in the cloud using Cryostat.

andrewazores avatar Jun 30 '22 13:06 andrewazores

Hi @zhelanov ,

The Cryostat 2.3 release is imminent and part of that is the Cryostat Agent. With the Agent hooked up to your target application and a few configuration options set (environment variables or system properties), the Agent can be made to solve your original request - when the target JVM is shutting down, the Agent will block shutdown, push the latest JFR data in the target to Cryostat, and then unblock shutdown. This way the tail end of the recording data from the target can be captured and retained. This operation must not block shutdown for too long however, or else the container platform will likely end up forcibly killing the target application and interrupting the JFR push so some parameter tuning may be required.

andrewazores avatar Apr 25 '23 19:04 andrewazores