parca-agent icon indicating copy to clipboard operation
parca-agent copied to clipboard

Clean tmp dir on start-up

Open zdyj3170101136 opened this issue 1 year ago • 9 comments

Describe the bug agent continous oom, and produce too much data in tmp dir.

Expected behavior server should not oom, even oom it should not produce too much data.

server should clean up all tmp dir when restarted. Screenshots or Profiles (using https://pprof.me)

截屏2023-11-15 下午5 55 27

Reason

the agent want to upload debuginfo of clickhouse server, which is 2GB. this request is rejected by parca server in default when server is normal.

but my s3 have some problem, so the server would retuen all InitialUploadRequest with true.

and my agent version is v0.15.0, so it will return true when err is not nil: https://github.com/parca-dev/parca-agent/blob/v0.15.0/pkg/debuginfo/manager.go#L329

the agent would create tmp debuginfo when InitialUploadRequest err is not nil.

the agent would easily oom with too much concurrent creating of tmp debuginfo file.

the agent would create permanent tmp debuginfo file when oom.

so finally the disk usage is 100 percent.

zdyj3170101136 avatar Nov 15 '23 10:11 zdyj3170101136

the bug is fixed in newly released.

but i want the agent cleanup all tmp directory when started to prevent any possible oom.

zdyj3170101136 avatar Nov 15 '23 10:11 zdyj3170101136

I think cleaning up the debuginfo temp dir is a valid request. Do you want to create a PR to add this?

brancz avatar Nov 15 '23 10:11 brancz

I think cleaning up the debuginfo temp dir is a valid request. Do you want to create a PR to add this?

i noticed the shouldInitiateUploadResponseCache is removed in latest agent.

Is it necessary? it could reduce many calls and correspond s3 request.

zdyj3170101136 avatar Nov 16 '23 03:11 zdyj3170101136

Did you find the commit that removed it?

brancz avatar Nov 16 '23 09:11 brancz

i noticed the shouldInitiateUploadResponseCache is removed in latest agent.

If this is the case, it's not intentional. Could you point us to the culprit commit?

kakkoyun avatar Nov 16 '23 13:11 kakkoyun

i noticed the shouldInitiateUploadResponseCache is removed in latest agent.

If this is the case, it's not intentional. Could you point us to the culprit commit?

https://github.com/parca-dev/parca-agent/commit/127ce4ec2d8f1fdf3fa2c6b2674b7acbddeab9f2#diff-5c4a0ca9a2747c99b32f629099e552b2582da981ff90f6bcedd6044dbe11e359L220

zdyj3170101136 avatar Nov 17 '23 06:11 zdyj3170101136

i noticed the shouldInitiateUploadResponseCache is removed in latest agent.

If this is the case, it's not intentional. Could you point us to the culprit commit?

127ce4e#diff-5c4a0ca9a2747c99b32f629099e552b2582da981ff90f6bcedd6044dbe11e359L220

I don't think this is the root cause of the problem. You can see in the same changeset, we merely changed the location of the cache https://github.com/parca-dev/parca-agent/commit/127ce4ec2d8f1fdf3fa2c6b2674b7acbddeab9f2#r133359657

kakkoyun avatar Nov 23 '23 16:11 kakkoyun

However, we can add something to clean the given temporary directory as part of the start-up sequence or periodically.

Contributions are welcome 🤗

kakkoyun avatar Nov 23 '23 16:11 kakkoyun

I think on startup sounds good!

brancz avatar Nov 23 '23 16:11 brancz