krr Prometheus workload loader

[x] Implemented ability to select an alternative workload loader
[x] Implemented Prometheus workload loader
[x] Refactored Kube API workload loader: moved separate logic (for each Kind) to a separate class
[x] List clusters on centralized prometheus
[x] Scanning multiple clusters in prometheus mode?
[ ] Test running from robusta UI
[ ] Test prometheus mode on centralized prometheus
[x] A lot of testing
[x] Fix tests (mocks are now broken)

Testing issues found:

[x] For some reason auto-discovery does not work
[x] Prometheus mode does not gather HPA data
[x] Something is now broken with HPAKey

Apr 22 '24 12:04 LeaveMyYard

When testing using the prometheus mode on a centralized Prometheus (VictoriaMetrics), I encountered the following errors. I specified the labels --prometheus-label and --prometheus-cluster-label according to the docs:

python krr.py simple -p https://vmauth.xxx.com --prometheus-auth-header "<redacted>" --prometheus-label cluster -l k8s0-q --mode prometheus

To me it seems that the query includes the wrong label here, and should be avg by(cluster) instead of using the cluster name.

May 22 '24 10:05 deutschj

Hi @deutschj, You're correct of course! I've pushed a fix. Can you please test?

Jun 14 '24 05:06 aantn

@deutschj please note that I changed the name of the flags a little -- see --help if necessary.

Jun 14 '24 05:06 aantn

@aantn Sure, thanks a lot for working on the bugfix!

Whett testing with the options --prometheus-cluster-key and -l I encountered the following error:

krr git:(prometheus-workload-loader) python krr.py simple --prometheus-url https://vmauth.xxx.com --prometheus-auth-header "<redacted>" --mode prometheus --prometheus-cluster-key cluster -l k8s0-q

Running Robusta's KRR (Kubernetes Resource Recommender) 1.8.2-dev
Using strategy: Simple
Using formatter: table
A newer version of KRR is available: v1.11.0

[08:22:56] INFO     Connecting using Prometheus, will load the kubeconfig.
           INFO     Using Prometheus at https://vmauth.xxx.com
           INFO     Prometheus found
           INFO     Prometheus connected successfully            
[08:22:57] INFO     Clusters available: k8s0-q, <cluster1>, <cluster2>, [...]
           CRITICAL Cannot scan multiple clusters for this prometheus, Rerun with the flag `-c <cluster>` where <cluster> is one of ['k8s0-q', '<cluster1>', '<cluster2>', [...]]

However, when providing only the -l option without the --prometheus-cluster-key, KRR starts to generate recommendations for the selected cluster. When generating, the following warnings are displayed for every existing workload:

WARNING  Prometheus returned no PercentileCPULoader metrics for StatefulSet xxx
WARNING  Prometheus returned no MaxMemoryLoader metrics for StatefulSet xxx
WARNING  Prometheus returned no CPUAmountLoader metrics for StatefulSet xxx
WARNING  Prometheus returned no MemoryAmountLoader metrics for StatefulSet xxx
INFO     Calculated recommendations for StatefulSet xxx (using 4 metrics)

And the table with the generated resource recommendations in the end is empty though. So looks like now getting the existing workloads in the cluster works correctly, but getting the corresponding metrics from the centralized Prometheus doesn't yet.

Jun 27 '24 06:06 deutschj

@deutschj does it work if you provide --prometheus-cluster-key, -l, and -c with -c set to the same value as -l?

Jun 28 '24 21:06 aantn

@aantn Unfortunately not, no - this yields the same error as above, CRITICAL Cannot scan multiple clusters for this prometheus, Rerun with the flag '-c <cluster>' where <cluster> is one of [...]

Jul 02 '24 10:07 deutschj

Would love to see this merged, using KRR from this feature branch for some time now

Jul 08 '24 08:07 Pionerd

Hey, we're planning to get it merged, but no exact ETA yet.

Jul 08 '24 11:07 aantn

hello. Are there any plans for this branch to be merged?

Aug 12 '24 14:08 wad-hongsumin

hey @wad-hongsumin

We do plan to merge it, hopefully soon

Aug 12 '24 14:08 arikalon1

@arikalon1 thank you. I'm hoping this PR gets merged quickly.

Aug 12 '24 22:08 wad-hongsumin

krr krr copied to clipboard

Prometheus workload loader

krr
krr copied to clipboard