chartmuseum
chartmuseum copied to clipboard
Slow to start and unstable with 7000 charts in GCP
We have had quite a lot of failures recently due to chartmuseum being super slow when rebuilding the index on the fly, having to parse 7000 charts in a Google Cloud Storage bucket.
Startup time is also dramatically impacted, taking up to 5 min. I had to adjust the probes initial delay to prevent Kubernetes from loop-killing chartmuseum.
To reproduce slow start:
- Have a bucket with 7000 charts, and no
index-cache.yaml
- Deploy chartmuseum with DEBUG=true and initial probes delay set to 400s
- Look at the logs, in particular the delay until "Starting Chart museum" appears
To reproduce long running queries ending in timeout:
- With a 7000 chart bucket, and chartmuseum running
- Manually add a chart to the bucket
- Request /index.yaml
- Chartmuseum will notice a difference between the current index.yaml and the charts present, and will re-create the file on the fly, which takes minutes.
Note that, after cleaning 5500 older charts from the repo, response and startup times improved.
Idea for a solution: If the time spent is waiting for I/O on the bucket, using batches to retrieve files may accelerate things (https://cloud.google.com/storage/docs/json_api/v1/how-tos/batch)
It seems that batch
is a platform-specified solution though . As the performance issue reported in #332 , not only add charts , but get charts also will be affected.
May the auto-purge is a general workaround for you ?
As of today I sit with ~22 000 charts on LTS same setup, same problems. I did enable statefile to mitigate the ~30 min boot, tried redis (which utterly failed to achieve anything as the index rebuilds with same speed after run as if there is no redis at all even if I seen cached hits made no difference). So problem number 2 - no lifecycle, I am currently trying reduce number with external lifecycling. And then after enabling statefile caching in the repo I still get a lot of problems, the main one is pull after cm-push with repo update almost always is failing and I have a ci/cd 3 sec 60 tries retrier to amend while waiting for 0.14. At this point started looking into contributing to the project as we are locked in. On a side note I am going to switch to 2 instance setup, warm/cold bucket. I am intending to keep around 1000 in warm and 14000 cold. This is again a temporary setup until chartmuseum manages to get on top of it.