cortex-helm-chart General defaults need update

trafficstars

Deploying the chart as is with the defaults generally results in a poorly performing instance. The cortex-jsonnet project makes a number of changes to these defaults that we should adopt here. General buckets for these changes:

Memcached
- Need to provide cortex configuration for memcached to improve performance and match settings provided to the memcached charts
- Should provide additional options for the memcached charts by default to increase max connections and max item size. This should be tied to the same override for the cortex memcached client config
Pod sizes
- In general, PV sizes are too small and should be changed
- Memory and CPU requests should be added to each tier and extra args should be provided to tune to that size (grpc/server). Scaling the tier should be done by changing the replica count, not messing with the other settings. We can provide a small.yaml override for folks that want to give cortex a try, but anyone attempting to run cortex for real should be using properly sized pods and we should mention this in the readme.
- store gateway should be sharded by default
cortex config
- this needs a lot of work.
- lazy index loading should be enabled by default
- meta and tenant sync concurrency should be templatable in values.yaml with some comments around increasing if you are targeting a larger single tenant vs many small tenants
- block retention and querier store_after / query within behavior should be defaulted and have comments explaining how it works, potentially being templatable (these three settings are tied together, so having a single values setting for them would be nice)
- heartbeat periods and timeouts should be increased for both distributors and ingesters
- compactor should having deletion delay set to 1h
- store gateway sharding should be enabled by default
- distributors should extend writes and have a longer remote timeout

May 27 '22 14:05 justinrush

If these sound OK in general, i'm happy to submit a PR

May 27 '22 17:05 justinrush

If these sound OK in general, i'm happy to submit a PR

Yes some of these are really dependant on the size of the deployment but generally speaking these definitely sound like a good idea

May 27 '22 19:05 nschad

If these sound OK in general, i'm happy to submit a PR

Hey, Just following up. Do you still wish to submit a PR? Is there anything I can do?

Jul 02 '22 20:07 nschad

I've been looking into this and some of the settings I'm going to add are generally applicable regardless of the size of the deployment. However, some are only appropriate when certain other conditions are true, such as store gateway replica count greater than 1, various memcached instances enabled, etc. What are folks thoughts on templating the content of the config value so we can put go templates in there? We could then add some values setting outside of config to enable things.

Alternatively we could provide a small_values.yaml file that has settings more appropriate for small, proof of concept, etc, deployments - basically what values.yaml is today. Then we could assume things most production deployments likely use, such as:

memcached for all tiers
resources defs for ingesters
store gateway scale out
appropriate pv sizes
etc

And have this be the default values.yaml.

Or maybe this chart could assume a distributed non-poc deployment and we could point folks towards whatever comes of this issue? https://github.com/cortexproject/cortex-helm-chart/issues/378 ... looks to be going the way of a standalone chart?

Jul 26 '22 07:07 justinrush

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

Sep 20 '22 23:09 stale[bot]

cortex-helm-chart cortex-helm-chart copied to clipboard

General defaults need update

cortex-helm-chart
cortex-helm-chart copied to clipboard