chartmuseum icon indicating copy to clipboard operation
chartmuseum copied to clipboard

proposal: internal index locking in HA setup

Open mailzyok opened this issue 5 years ago • 14 comments

Hi

We have tried chartmuseum, and it is great and very easy to use. We are curious on whether chartmuseum supports a HA deployment on K8S.

Thanks, yzha

mailzyok avatar Jul 11 '18 04:07 mailzyok

@mailzyok Thanks! Glad you're having a good experience.

In general, it should be safe to run chartmuseum across multiple replicas.

My only suggestion is to use Redis for the cache by specifying the following options:

  --cache="redis" \
  --cache-redis-addr="localhost:6379" \
  --cache-redis-password="" \
  --cache-redis-db=0

This will result in minimal requests against your storage backend (chart downloads etc.) when building the repository index (index.yaml)

Please let us know if you experience any issues running HA

jdolitsky avatar Jul 11 '18 21:07 jdolitsky

@jdolitsky Thanks for the quick response. We will have a try in coming days and get back to you later.

mailzyok avatar Jul 12 '18 02:07 mailzyok

@jdolitsky We are trying the HA deployment with multiple replicas, here are some questions: What will happen if two replicas trying to building the index.yaml at the same time, will there be a conflict? What will happen if two replicas trying to upload the same chart achives at the same time, will there be a conflict?

Thanks, yzha

mailzyok avatar Aug 15 '18 03:08 mailzyok

FYI: We have tried a simple HA setup, and we encountered some caching issues (reported in #175)

born4new avatar Oct 16 '18 07:10 born4new

@jdolitsky We are trying the HA deployment with multiple replicas, here are some questions: What will happen if two replicas trying to building the index.yaml at the same time, will there be a conflict? What will happen if two replicas trying to upload the same chart achives at the same time, will there be a conflict?

Thanks, yzha

@jdolitsky Any updates?

duyanghao avatar Jun 05 '20 13:06 duyanghao

@jdolitsky Any updates?

Abhishek1121-tech avatar Jun 20 '20 11:06 Abhishek1121-tech

@mailzyok Hey , may I ask If you set the cache mechanism ? Or if you use the same backend storage , I guess it depends on what's your chart version is ? If all the same , the second upload will be ignored without allowoverwrite set .

scbizu avatar Jun 20 '20 18:06 scbizu

I've just tried running HA and no it doesn't work properly in HA setup. Index is not being regenerated in second replica when new chart is pushed. DISABLE_STATEFILES=true option did not help either. We use GCP storage bucket for persistence layer.

r0kas avatar Apr 02 '21 10:04 r0kas

Stumbled into a race condition with 0.13.1.

We use concourse, so when using a "put" which uses helm push to add a chart via the chartmuseum API, our concourse resource immediately makes a get request for the chart version it just pushed. I think this GET ends up on a chartmuseum pod which didn't get the PUT connection, and that pod ends up writing out the cache.

end result: new chart was not in the index-cache.yaml file, and 2 of the pods reported the version missing, while the remaining pod would happily return the version and the chart from s3.

samrees avatar Jun 08 '21 23:06 samrees

any updates here ? it's really important to have this component in a HA setup

ilyesAj avatar Dec 13 '21 14:12 ilyesAj

I imagine the easiest way to achieve high availability would be to allow running certain replicas in read-only mode and have one read-write instance. However, I have no idea if this should work at the moment.

sagikazarmark avatar Jan 13 '22 13:01 sagikazarmark

IMHO, To setup CM in multiple replica , you should first use the external cache mechanism (redis cluster ) , and then all replica should share the same storage path (the same s3 bucket for example). The internal cache mechanism (local fs) will result in cache index missing . And I think there is no other better choice to achieve the HA setup.

For the potential race or conflict condition , because it will update the cache(redis cluster) before update the index file (s3 bucket) , the race will less happen as expected . But I never tried these kinds of setup , it will be appreciate if someone can try this and share the result with us XD . ( And I think I will have some time to follow with these setup )

scbizu avatar Jan 13 '22 18:01 scbizu

And I think there is no other better choice to achieve the HA setup.

The way things currently are, probably not. But with one write replica, you can reduce the chance of a race to practically zero (given read operations don't do any kind of indexing).

The ultimate solution IMO is introducing some sort of locking in the indexer (there is already an external cache which is pretty much redis only as far as I can tell, so why not use it for locking as well?)

Even if the external cache might be enough to run Chartmuseum in HA, I absolutely understand why people hesitate to accept that. I also think there is room for improvement.

sagikazarmark avatar Jan 13 '22 21:01 sagikazarmark

Even if the external cache might be enough to run Chartmuseum in HA, I absolutely understand why people hesitate to accept that. I also think there is room for improvement.

yeah , totally agree . But the internal index locking needs more development on this , though , the solution which I given only based on currently codebase (v0.13.1 or 0.14.0 pre release) and I know it's not perfect enough .

scbizu avatar Jan 14 '22 02:01 scbizu