continuous-async-profiler icon indicating copy to clipboard operation
continuous-async-profiler copied to clipboard

Safe compress/delete in multi process environment

Open michaldo opened this issue 1 year ago • 2 comments

Consider case when application is running on Kubernetes. Number of pods may vary

With output file prefix equal to pod id, pod output files will not collide. However, it is possible that 2 pods start a compression or deletion at the same time and they may modify the same files. I think some kind of locks is required.

File API has a locking, but I heard behavior is OS specific. I'm also afraid that File API lock may not work in cloud volume world. Do you agree? Other option is use file like /.lock as a semaphore. Can you share a best practice how to synchronize Java processes over semaphore file?

michaldo avatar Oct 19 '23 07:10 michaldo

The easiest way to avoid collision is to set:

  • async-profiler.continuous.output-dir.continuous
  • async-profiler.continuous.output-dir.archive

To different paths. The path can contain pod id ant that solves the problem I believe.

Any locks on filesystem are hard to manage. You need to find a solution for "node that held the lock was killed" and so on. If you really need a functionality to save files to same directory from multiple JVMs then I believe this is better solution:

The library is aware of files that are generated:

        return String.format(
                "jfr,event=%s%s,file=%s/%s-%s.jfr",
                event,
                additionalParameters,
                notManageableProperties.getContinuousOutputDir(),
                event,
                date
        );

We can add in memory (concurrent) collection that will store files generated by single JVM. On that basis we can compress/delete/move to archive.

krzysztofslusarski avatar Oct 19 '23 08:10 krzysztofslusarski

The easiest way to avoid collision is to set (output dirs) to different paths. The path can contain pod id ant that solves the problem I believe.

It is wrong idea to have directory per (temporal by nature) pod. There will be plenty of directories, hard to manage, hard to select by time. When node is deleted, nobody will care to clean its output files, because its directory will not be assigned to any live pod.

You need to find a solution for "node that held the lock was killed" and so on.

That should not be so hard, lock may have a time limit. I bet this problem is already solved, but I could find a solution.

Anyway, for now I think best option is left compression and deletion unsafe. In worst case some profiler output files will be broken - acceptable.

michaldo avatar Oct 19 '23 09:10 michaldo