prometheus-engine icon indicating copy to clipboard operation
prometheus-engine copied to clipboard

Split the config maps based on size while being created. So that many config maps are created with max size of 950KB and rule evaluator loads all configmaps to file for rules generator

Open balakumarpg opened this issue 2 months ago • 7 comments

Why: Since the k8s configmaps have hard limit of 1MB, when there is a need of creating more GlobalRules, ClusterRules or Rules, all are going into one ConfigMap by rule-generator and from there it is being mounted as volume to rule-evaluator. Which brings the hard limit of alert definitions in one GKE cluster which is enabled the GMP can not go beyond 1 MB size.

This change will overcome that hard limit.

balakumarpg avatar Nov 10 '25 15:11 balakumarpg

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

google-cla[bot] avatar Nov 10 '25 15:11 google-cla[bot]

Thanks!

So this makes sense on operator side, but how rule-eval should consume this? rule-eval currently only reads one config map and there's no easy way (or is there?) to tell rule-val pod to load dynamic number of those files? 🤔

We can't change deployment dynamically in practice within the security constraints we need to work with for managed GMP solution at the moment. That would be the only solution, right?

bwplotka avatar Nov 18 '25 16:11 bwplotka

Thanks!

So this makes sense on operator side, but how rule-eval should consume this? rule-eval currently only reads one config map and there's no easy way (or is there?) to tell rule-val pod to load dynamic number of those files? 🤔

We can't change deployment dynamically in practice within the security constraints we need to work with for managed GMP solution at the moment. That would be the only solution, right?

Thanks for the valuable input. Please check now, I have made some changes with respect to your comments.

balakumarpg avatar Nov 23 '25 22:11 balakumarpg

Nice, sounds like the plan would be to do projections and split by 3 at least. Is that solving your use case? Do you have more or less equal distribution of rules across those types?

We could do projection of 10 then and split up to 10? Would that be reasonable?

bwplotka avatar Nov 24 '25 08:11 bwplotka

Also before we add some complexity, have you tried compression option? https://github.com/GoogleCloudPlatform/prometheus-engine/blob/main/doc/api.md#monitoring.googleapis.com/v1.ConfigSpec

bwplotka avatar Nov 24 '25 14:11 bwplotka

Also before we add some complexity, have you tried compression option? main/doc/api.md#monitoring.googleapis.com/v1.ConfigSpec

Yes, after compression it is 1.3 MB, only the GlobalRules.

balakumarpg avatar Nov 25 '25 07:11 balakumarpg

Nice, sounds like the plan would be to do projections and split by 3 at least. Is that solving your use case? Do you have more or less equal distribution of rules across those types?

We could do projection of 10 then and split up to 10? Would that be reasonable?

Not equally distributed. We are only using GlobalRules, but we can use Rules and ClusterRules as well. If this 1MB problem is solved or extended limit of 3MB then, we can live with that for a while and design our alerts in distributed using these 3 types.

balakumarpg avatar Nov 25 '25 07:11 balakumarpg