parca-agent
parca-agent copied to clipboard
Clean tmp dir on start-up
Describe the bug agent continous oom, and produce too much data in tmp dir.
Expected behavior server should not oom, even oom it should not produce too much data.
server should clean up all tmp dir when restarted. Screenshots or Profiles (using https://pprof.me)
Reason
the agent want to upload debuginfo of clickhouse server, which is 2GB. this request is rejected by parca server in default when server is normal.
but my s3 have some problem, so the server would retuen all InitialUploadRequest with true.
and my agent version is v0.15.0, so it will return true when err is not nil: https://github.com/parca-dev/parca-agent/blob/v0.15.0/pkg/debuginfo/manager.go#L329
the agent would create tmp debuginfo when InitialUploadRequest err is not nil.
the agent would easily oom with too much concurrent creating of tmp debuginfo file.
the agent would create permanent tmp debuginfo file when oom.
so finally the disk usage is 100 percent.
the bug is fixed in newly released.
but i want the agent cleanup all tmp directory when started to prevent any possible oom.
I think cleaning up the debuginfo temp dir is a valid request. Do you want to create a PR to add this?
I think cleaning up the debuginfo temp dir is a valid request. Do you want to create a PR to add this?
i noticed the shouldInitiateUploadResponseCache is removed in latest agent.
Is it necessary? it could reduce many calls and correspond s3 request.
Did you find the commit that removed it?
i noticed the shouldInitiateUploadResponseCache is removed in latest agent.
If this is the case, it's not intentional. Could you point us to the culprit commit?
i noticed the shouldInitiateUploadResponseCache is removed in latest agent.
If this is the case, it's not intentional. Could you point us to the culprit commit?
https://github.com/parca-dev/parca-agent/commit/127ce4ec2d8f1fdf3fa2c6b2674b7acbddeab9f2#diff-5c4a0ca9a2747c99b32f629099e552b2582da981ff90f6bcedd6044dbe11e359L220
i noticed the shouldInitiateUploadResponseCache is removed in latest agent.
If this is the case, it's not intentional. Could you point us to the culprit commit?
127ce4e#diff-5c4a0ca9a2747c99b32f629099e552b2582da981ff90f6bcedd6044dbe11e359L220
I don't think this is the root cause of the problem. You can see in the same changeset, we merely changed the location of the cache https://github.com/parca-dev/parca-agent/commit/127ce4ec2d8f1fdf3fa2c6b2674b7acbddeab9f2#r133359657
However, we can add something to clean the given temporary directory as part of the start-up sequence or periodically.
Contributions are welcome 🤗
I think on startup sounds good!