chartmuseum icon indicating copy to clipboard operation
chartmuseum copied to clipboard

Slow to start and unstable with 7000 charts in GCP

Open guneemwelloeux opened this issue 4 years ago • 2 comments

We have had quite a lot of failures recently due to chartmuseum being super slow when rebuilding the index on the fly, having to parse 7000 charts in a Google Cloud Storage bucket.

Startup time is also dramatically impacted, taking up to 5 min. I had to adjust the probes initial delay to prevent Kubernetes from loop-killing chartmuseum.

To reproduce slow start:

  • Have a bucket with 7000 charts, and no index-cache.yaml
  • Deploy chartmuseum with DEBUG=true and initial probes delay set to 400s
  • Look at the logs, in particular the delay until "Starting Chart museum" appears

To reproduce long running queries ending in timeout:

  • With a 7000 chart bucket, and chartmuseum running
  • Manually add a chart to the bucket
  • Request /index.yaml
  • Chartmuseum will notice a difference between the current index.yaml and the charts present, and will re-create the file on the fly, which takes minutes.

Note that, after cleaning 5500 older charts from the repo, response and startup times improved.

Idea for a solution: If the time spent is waiting for I/O on the bucket, using batches to retrieve files may accelerate things (https://cloud.google.com/storage/docs/json_api/v1/how-tos/batch)

guneemwelloeux avatar May 01 '20 15:05 guneemwelloeux

It seems that batch is a platform-specified solution though . As the performance issue reported in #332 , not only add charts , but get charts also will be affected.

May the auto-purge is a general workaround for you ?

scbizu avatar May 19 '20 17:05 scbizu

As of today I sit with ~22 000 charts on LTS same setup, same problems. I did enable statefile to mitigate the ~30 min boot, tried redis (which utterly failed to achieve anything as the index rebuilds with same speed after run as if there is no redis at all even if I seen cached hits made no difference). So problem number 2 - no lifecycle, I am currently trying reduce number with external lifecycling. And then after enabling statefile caching in the repo I still get a lot of problems, the main one is pull after cm-push with repo update almost always is failing and I have a ci/cd 3 sec 60 tries retrier to amend while waiting for 0.14. At this point started looking into contributing to the project as we are locked in. On a side note I am going to switch to 2 instance setup, warm/cold bucket. I am intending to keep around 1000 in warm and 14000 cold. This is again a temporary setup until chartmuseum manages to get on top of it.

s7an-it avatar Dec 28 '21 11:12 s7an-it