alloy
alloy copied to clipboard
[pyroscope.java] Old JFR files accumulating on disk
What's wrong?
When alloy is terminated, JFR files written to disk by the pyroscope.java file never seem to be cleaned-up. In an extreme scenario when alloy is frequently restarted, like if OOMKilled on kubernetes, then maybe the files gradually accumulate and start filling up the host's disk.
This is making us take a gradual approach to adopting continuous profiling since we're worried about disk pressure on our kubernetes nodes. Not sure how valid of a concern that is, but it would be nice if alloy cleaned-up any of old JFR files, like during start-up
Steps to reproduce
- Set-up alloy to auto-profile java applications, for example, using this recipe https://github.com/grafana/pyroscope/blob/main/examples/grafana-agent-auto-instrumentation/java/kubernetes/grafana-alloy.yaml
- Restart alloy and wait until it starts triggering profiling
- Check the
/tmpdirectory of the java application container. Except after a rare race condition, there should now be two JFR files
System information
Ubuntu Linux 22.04.3 x86_64
Software version
v1.4.3
Configuration
https://github.com/grafana/pyroscope/blob/main/examples/grafana-agent-auto-instrumentation/java/kubernetes/grafana-alloy.yaml
Logs
No response
This issue has not had any activity in the past 30 days, so the needs-attention label has been added to it.
If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue.
The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity.
Thank you for your contributions!
@swar8080 let us know if https://github.com/grafana/alloy/pull/3630 improves the situation for you an if this can be closed.
We might be able to avoid disk use all together, by e.g named pipes. Let's see if this improves this enough.
Closing this as we think the root problem is resolved here.