helm-dashboard icon indicating copy to clipboard operation
helm-dashboard copied to clipboard

Helm Dashboard Slow Performance

Open andriktr opened this issue 1 year ago • 9 comments

Description

Hey, We recently implemented helm-dashboard in our dev cluster which has hundreds of ns's and helm charts. Everything fine except the performance of helm-dashboard UI. It might take a minutes to load a chart list or show the manifests of specific chart.
In the helm-dashboard log we see that simple api requests takes tens of seconds or even minutes to complete

[GIN] 2024/06/10 - 12:47:33 | 200 |         2m28s |   10.162.215.11 | GET      "/api/helm/releases/swarm-gateway-stage/swarm-gateway-stage/resources?health=true"
[GIN] 2024/06/10 - 12:47:35 | 200 |         2m29s |   10.162.215.11 | GET      "/api/helm/releases/static-site-storage-cleanup-jobs-dev/static-site-storage-cleanup-jobs-dev/resources?health=true"
[GIN] 2024/06/10 - 12:47:37 | 200 |         2m31s |   10.162.215.11 | GET      "/api/helm/releases/quick-search-stage/quick-search-stage/resources?health=true"
[GIN] 2024/06/10 - 12:47:38 | 200 |         2m33s |   10.162.215.11 | GET      "/api/helm/releases/pdf-composer-dev/pdf-composer-dev/resources?health=true"
[GIN] 2024/06/10 - 12:47:39 | 200 |       37.05µs |   10.162.215.11 | GET      "/status"
[GIN] 2024/06/10 - 12:47:39 | 200 |      41.748µs |   10.162.215.11 | GET      "/status"
[GIN] 2024/06/10 - 12:47:40 | 200 |         2m34s |   10.162.215.11 | GET      "/api/helm/releases/vehicle-brand-model-sync-dev/vehicle-brand-model-sync-dev/resources?health=true"
[GIN] 2024/06/10 - 12:47:42 | 200 |         2m36s |   10.162.215.11 | GET      "/api/helm/releases/secret-manager-stage/secret-manager-stage/resources?health=true"
[GIN] 2024/06/10 - 12:47:43 | 200 |         2m38s |   10.162.215.11 | GET      "/api/helm/releases/vehicle-brand-model-sync-stage/vehicle-brand-model-sync-stage/resources?health=true"
[GIN] 2024/06/10 - 12:47:45 | 200 |         2m39s |   10.162.215.11 | GET      "/api/helm/releases/pricing-engine-job-dev/pricing-engine-job-dev/resources?health=true"
[GIN] 2024/06/10 - 12:47:47 | 200 |         2m41s |   10.162.215.11 | GET      "/api/helm/releases/pricing-engine-stage/pricing-engine-stage/resources?health=true"
[GIN] 2024/06/10 - 12:47:48 | 200 |         2m43s |   10.162.215.11 | GET      "/api/helm/releases/motor-registry-lv-dev/motor-registry-lv-dev/resources?health=true"
[GIN] 2024/06/10 - 12:47:49 | 200 |      52.039µs |   10.162.215.11 | GET      "/status"
[GIN] 2024/06/10 - 12:47:49 | 200 |      34.271µs |   10.162.215.11 | GET      "/status"
[GIN] 2024/06/10 - 12:47:50 | 200 |         2m44s |   10.162.215.11 | GET      "/api/helm/releases/policies-dev/policies-dev/resources?health=true"
[GIN] 2024/06/10 - 12:47:52 | 200 |         2m46s |   10.162.215.11 | GET      "/api/helm/releases/quick-search-v2-dev/quick-search-v2-dev/resources?health=true"
[GIN] 2024/06/10 - 12:47:53 | 200 |         2m48s |   10.162.215.11 | GET      "/api/helm/releases/profile-dev/profile-dev/resources?health=true"
[GIN] 2024/06/10 - 12:47:55 | 200 |         2m49s |   10.162.215.11 | GET      "/api/helm/releases/pricing-engine-dev/pricing-engine-dev/resources?health=true"
[GIN] 2024/06/10 - 12:47:57 | 200 |         2m51s |   10.162.215.11 | GET      "/api/helm/releases/product-packages-dev/product-packages-dev/resources?health=true"
[GIN] 2024/06/10 - 12:47:58 | 200 |         2m53s |   10.162.215.11 | GET      "/api/helm/releases/profile-doors-sync-dev/profile-doors-sync-dev/resources?health=true"
[GIN] 2024/06/10 - 12:47:59 | 200 |      45.185µs |   10.162.215.11 | GET      "/status"
[GIN] 2024/06/10 - 12:47:59 | 200 |      36.113µs |   10.162.215.11 | GET      "/status"
[GIN] 2024/06/10 - 12:48:00 | 200 |         2m54s |   10.162.215.11 | GET      "/api/helm/releases/secret-manager-dev/secret-manager-dev/resources?health=true"
[GIN] 2024/06/10 - 12:48:02 | 200 |         2m56s |   10.162.215.11 | GET      "/api/helm/releases/swarm-gateway-dev/swarm-gateway-dev/resources?health=true"
[GIN] 2024/06/10 - 12:48:03 | 200 |         2m58s |   10.162.215.11 | GET      "/api/helm/releases/saikas-dms-middleware-stage/saikas-dms-middleware-stage/resources?health=true"
[GIN] 2024/06/10 - 12:48:05 | 200 |         2m59s |   10.162.215.11 | GET      "/api/helm/releases/profile-sales-sync-dev/profile-sales-sync-dev/resources?health=true"
[GIN] 2024/06/10 - 12:48:07 | 200 |          3m1s |   10.162.215.11 | GET      "/api/helm/releases/saikas-proxy-dev/saikas-proxy-dev/resources?health=true"
[GIN] 2024/06/10 - 12:48:08 | 200 |          3m3s |   10.162.215.11 | GET      "/api/helm/releases/quick-search-v2-stage/quick-search-v2-stage/resources?health=true"
[GIN] 2024/06/10 - 12:48:09 | 200 |       34.01µs |   10.162.215.11 | GET      "/status"
[GIN] 2024/06/10 - 12:48:09 | 200 |      35.801µs |   10.162.215.11 | GET      "/status"
[GIN] 2024/06/10 - 12:48:10 | 200 |          3m5s |   10.162.215.11 | GET      "/api/helm/releases/profile-stage/profile-stage/resources?health=true"
[GIN] 2024/06/10 - 12:48:12 | 200 |          3m6s |   10.162.215.11 | GET      "/api/helm/releases/time-machine-dev/time-machine-dev/resources?health=true"
[GIN] 2024/06/10 - 12:48:14 | 200 |          3m8s |   10.162.215.11 | GET      "/api/helm/releases/saikas-proxy-stage/saikas-proxy-stage/resources?health=true"
[GIN] 2024/06/10 - 12:48:15 | 200 |         3m10s |   10.162.215.11 | GET      "/api/helm/releases/profile-sales-sync-stage/profile-sales-sync-stage/resources?health=true"
[GIN] 2024/06/10 - 12:48:17 | 200 |         3m11s |   10.162.215.11 | GET      "/api/helm/releases/product-packages-stage/product-packages-stage/resources?health=true"
[GIN] 2024/06/10 - 12:48:19 | 200 |      31.666µs |   10.162.215.11 | GET      "/status"
[GIN] 2024/06/10 - 12:48:19 | 200 |      37.349µs |   10.162.215.11 | GET      "/status"
[GIN] 2024/06/10 - 12:48:19 | 200 |         3m13s |   10.162.215.11 | GET      "/api/helm/releases/seb-hh-portfolio-stage/seb-hh-portfolio-stage/resources?health=true"
[GIN] 2024/06/10 - 12:48:20 | 200 |         2m26s |   10.162.215.11 | GET      "/api/helm/releases"
[GIN] 2024/06/10 - 12:48:21 | 204 |      54.039µs |   10.162.215.11 | GET      "/api/helm/repositories/latestver?name=platform-dotnet-chart"
[GIN] 2024/06/10 - 12:48:21 | 200 |      43.608µs |   10.162.215.11 | GET      "/api/helm/repositories/latestver?name=heartbeat"
[GIN] 2024/06/10 - 12:48:21 | 204 |      43.804µs |   10.162.215.11 | GET      "/api/helm/repositories/latestver?name=heartbeat-prerequisites"
[GIN] 2024/06/10 - 12:48:21 | 204 |      35.803µs |   10.162.215.11 | GET      "/api/helm/repositories/latestver?name=import-map-deployer"
[GIN] 2024/06/10 - 12:48:21 | 200 |        80.3µs |   10.162.215.11 | GET      "/static/helm-gray-50.svg"
[GIN] 2024/06/10 - 12:48:23 | 200 |  1.915285787s |   10.162.215.11 | GET      "/api/helm/releases/accident-dev/accident-dev/resources?health=true"
[GIN] 2024/06/10 - 12:48:24 | 200 |  3.544809793s |   10.162.215.11 | GET      "/api/helm/releases/accident-stage/accident-stage/resources?health=true"
[GIN] 2024/06/10 - 12:48:26 | 200 |  5.245965256s |   10.162.215.11 | GET      "/api/helm/releases/activity-tracker-dev/activity-tracker-dev/resources?health=true"
[GIN] 2024/06/10 - 12:48:28 | 200 |  6.914257095s |   10.162.215.11 | GET      "/api/helm/releases/activity-tracker-stage/activity-tracker-stage/resources?health=true"
[GIN] 2024/06/10 - 12:48:29 | 200 |      38.601µs |   10.162.215.11 | GET      "/status"
[GIN] 2024/06/10 - 12:48:29 | 200 |       40.45µs |   10.162.215.11 | GET      "/status"
[GIN] 2024/06/10 - 12:48:29 | 200 |   8.62643479s |   10.162.215.11 | GET      "/api/helm/releases/ap-sms-update-dev/ap-sms-update-dev/resources?health=true"

Any suggestions on how to improve the performance. Container compute resources looks like: image what is 10% of limit.

Screenshots

image image image

Additional information

No response

andriktr avatar Jun 10 '24 12:06 andriktr

With the large and many charts, the requests for status and health take a while. The problem exists, and we need to find a way to fix it.

undera avatar Jun 10 '24 13:06 undera

Charts are actually pretty much small. On the last screen u can see that history url loading took almost 3 mins it's not "a while" I would say :). Glad u are already aware of issues.

andriktr avatar Jun 10 '24 14:06 andriktr

Yes performance of helm-dashboard degraded significantly when I have 50+ charts in my cluster. Most of the time dashboard requests are taking approx 1 min for /release and /history api endpoints. Lot of time requests are getting timed-out. Such slow performance making this dashboard useless. Please note I have provided 2vCPU core still performance is not improved. image

Hemant-Pardeshi avatar Jun 24 '24 09:06 Hemant-Pardeshi

Hi @undera do you have any plans to work on performance improvements ?

Hemant-Pardeshi avatar Jun 27 '24 09:06 Hemant-Pardeshi

Hi @undera do you have any plans to work on performance improvements ?

Right now, main job takes most of my time. I'm open for contributions and collaboration, though.

undera avatar Jun 27 '24 10:06 undera

@harshit-mehtaa and @andriktr, we are open to contributions :) Another option is to use Komodor, where we have those capabilities and much more, designed to scale (hundreds of helm charts, thousands of clusters ).

itielshwartz avatar Jul 18 '24 16:07 itielshwartz

I recently face the performance issues using helm-dashboard (my own fork), here's what I did to improve it:

  1. I modified the frontend of installed list to make it load lazily by suspending the resource API until the release is in view I introduced a dependency to do this import { useInView } from 'react-intersection-observer'
  2. I cached some "hot" resources, such as ConfigMaps, Deployments, ServiceAccounts using shared index informer, GetResourceInfo looks up in cache first. All informers are managed using an LRU cache with configurable size. I used github.com/hashicorp/golang-lru/v2 library for this, which is pretty handy.

The cache part should be carefully designed to prevent frequent initial lists, which may introduce greate overhead on apiserver and network

wylswz avatar Mar 27 '25 05:03 wylswz

@wylswz Thanks for sharing your findings. Maybe you are willing to contribute your FE changes from #1 into project via PR?

Regarding the cache - it needs to be carefully considered, because of risk of showing outdated data while in cluster it has changed.

undera avatar Mar 27 '25 10:03 undera

@wylswz Thanks for sharing your findings. Maybe you are willing to contribute your FE changes from #1 into project via PR?

Regarding the cache - it needs to be carefully considered, because of risk of showing outdated data while in cluster it has changed.

Yes, I can do that.

wylswz avatar Mar 27 '25 10:03 wylswz