helm-s3 icon indicating copy to clipboard operation
helm-s3 copied to clipboard

Reindex is too slow in repos with thousands of charts

Open drewwells opened this issue 1 year ago • 3 comments

We have 150k charts, 2.3gb size bucket, and a 40MB index file. I ran reindex in a k8s pod inside aws and it took a week to complete. I can suggest a few optimizations just from looking at the code.

Reindex is downloading each chart as it iterates through the index. Instead sync the entire s3 bucket locally then process or perhaps trusting the old index and only download objects that are missing from the index. Optimize or expose client configuration options see https://stackoverflow.com/a/48114553

It takes me 5mins to download all the objects with the above optimizations. I'd guess it would take hours with default settings

drewwells avatar Nov 28 '23 19:11 drewwells

Wow that's a real number. I agree that the plugin was never designed to handle such volumes, and there is definitely a room for improvement.

Out of curiosity, why do you have so many charts? Have you considered cleaning up i.e. removing unused versions?

hypnoglow avatar Nov 28 '23 22:11 hypnoglow

We have actually, but that itself is a challenge. Each delete (like upload) takes a considerable amount of time and probably 99% of the references need to be cleaned. The criteria I can think of is remove any charts over {time period old} then also consider an allow list of currently used chart-versions that are in use. I thought about it a while and forking s3 is probably the easiest way for us to fine tune reindex to have a prune option to apply that criteria.

drewwells avatar Nov 28 '23 23:11 drewwells

At a guess I would assume that deletion takes a while because helm-s3 downloads, edits and uploads the index on every delete. With large repos the index gets pretty beefy so this adds a lot of time. Would it be possible to support deleting multiple versions in one go?

Makeshift avatar Jan 25 '24 13:01 Makeshift