client_ruby icon indicating copy to clipboard operation
client_ruby copied to clipboard

One file per process instead of per metric/process for DirectFileStore [DOESNT WORK]

Open dmagliola opened this issue 5 years ago • 8 comments

This doesn't work!!

I'm just leaving it here as documentation in case someone tries to do this to solve #143 Read the commit comments for more.

NOTE: I still think this is the right thing to do (one file per process, rather than per metric/process) It'll have more contention in multi-threaded scenarios, but it'll drastically reduce the number of files

NOTE2: We also need to add a Mutex around that shared Dict, if we're having the shared Dict! The @lock in in_process_sync needs to be shared by all MetricStores

Anyway, i'm back to thinking we should either make a second DirectFileStore that does this (and have two separate files, instead of a param), or considerably refactor the DirectFileStore. My quick stab definitely doesn't work

dmagliola avatar Oct 14 '19 11:10 dmagliola

Coverage Status

Coverage decreased (-16.0%) to 84.042% when pulling 149fc3ce62aa3971105186507d9a0b24a13aaa1f on all_metrics_one_file into af2906641ba2a7eaad82ca0edf334da60c18e7f6 on master.

coveralls avatar Oct 14 '19 11:10 coveralls

Coverage Status

Coverage decreased (-16.0%) to 84.042% when pulling 149fc3ce62aa3971105186507d9a0b24a13aaa1f on all_metrics_one_file into af2906641ba2a7eaad82ca0edf334da60c18e7f6 on master.

coveralls avatar Oct 14 '19 11:10 coveralls

Coverage Status

Coverage decreased (-16.0%) to 84.042% when pulling 149fc3ce62aa3971105186507d9a0b24a13aaa1f on all_metrics_one_file into af2906641ba2a7eaad82ca0edf334da60c18e7f6 on master.

coveralls avatar Oct 14 '19 11:10 coveralls

@dmagliola Do you know if there's a working model of this proposed approach somewhere, I'd like to give it a try if possible? We are currently running into this issue due to the use of unicorn_worker_killer and the number of files that end up getting created whenever a new worker process gets spun up. Over time the /metrics endpoint becomes very slow and memory usage increases unboundedly.

saviogl avatar Mar 24 '21 07:03 saviogl

Hi @saviogl, there isn't sadly one. This is next on my "things to fix" list, and i'm hoping to get to it "in the next couple of weeks", but I don't want to under-deliver on that promise.

The way to try to solve this is to declare your own class that exposes the FileStore interface, and you'd point to that when declaring metrics.

The key thing to keep in mind here is that, while you'll have a lot of MetricStores, they all need to point to the same, shared instance of FileMappedDict, which you'd define at the FileStore level, not in the MetricStores as we do in DirectFileStore.

It's a non-trivial change, but that's basically what needs to happen.

The challenge for me is I wanted to offer the option: you can have file per process, or file per process-metric. Unfortunately, you can't do that with a flag as this PR proposes (or not without making the code a mess), so we'll probably have to either choose one or the other to offer, or offer two separate DataStores to pick from, which have a lot of code duplication.

I'm sure there's a better way i'm missing, but anyway, that's my thinking on this.

dmagliola avatar Mar 24 '21 12:03 dmagliola

@dmagliola Just trying to understand better whether or not this will actually solve our issue. While the amount of files is a problem to be had (maybe soon), the biggest issue right now is around memory consumption due to the fact we are "constantly" reaping unicorn worker processes for graceful memory management and therefore creating new "metrics" per processes. Moving to "One file per process instead of per metric/process for DirectFileStore" would tame the number of files created but the amount of data (metrics per process) would remain the same "correct"? And therefore we would still have an unbounded growth in regards to metrics -> memory for exporting those?

Is my assumption correct?

If I'm understanding the current model correctly, the solution for the memory unbounded growth would be to have processes to write the same file/metric with some process concurrent control system in place right?

saviogl avatar Mar 24 '21 20:03 saviogl

While the amount of files is a problem to be had (maybe soon), the biggest issue right now is around memory consumption due to the fact we are "constantly" reaping unicorn worker processes for graceful memory management and therefore creating new "metrics" per processes.

If I've understood you right this comment in #214 is the situation running into, and you're generating a tonne of metric series by churning through unicorn workers. Whatever we come up with in there won't help you when it comes to scraping metrics becoming slow because of a large number of files, but might help you reduce the number of metrics (and ultimately timeseries in your Prometheus instance) produced.

Sinjo avatar Mar 25 '21 00:03 Sinjo

FYI: on GitLab similar implementation is used: https://gitlab.com/gitlab-org/prometheus-client-mmap

wizardmatas avatar Jun 02 '21 11:06 wizardmatas