cortex-helm-chart
cortex-helm-chart copied to clipboard
Create a production ready values files
Most users using the helm chart have to specify a few things in the values.yaml to get their cortex to production ready status. We should create an alternate values file to make this path easier, maybe copying some of the values specified in cortex-jsonnet.
@friedrichg possibly you can start by providing what you believe is a good values.yaml
so others can evaluate.
Chiming in to say this would be incredibly helpful, especially for folks new to Kubernetes and Helm
As a new user working through the helm install I fully agree, there is quite a bit missing to get this installed and production ready. I'm currently working through issues with my distributor and ingester pods OOM'ing and it looks like it's because there are no memory limits set on the pods and also GOMEMLIMIT is not configured.
If I'm able to get cortex running correctly I will be sure to share my values here.
here is the ansible task we are using to do the helm install of cortex, values are specified here:
- name: Helm Cortex for Prometheus
kubernetes.core.helm:
name: cortex
binary_path: "{{ helm310_binary_path }}"
kubeconfig: "{{ context_file }}"
context: "{{ aks_name }}"
chart_ref: cortex-helm/cortex
chart_version: v2.2.0
wait: true
wait_timeout: 600s
release_namespace: "{{ namespace }}"
values:
ingress:
enabled: true
ingressClass:
name: "nginx"
annotations:
cert-manager.io/cluster-issuer: "{{ namespace }}-letsencrypt-issuer"
kubernetes.io/ingress.class: internal-nginx
hosts:
- host: "cortex.{{ dns_zone }}"
paths:
- /
tls:
- hosts:
- "cortex.{{ dns_zone }}"
secretName: cortex-tls
ingester:
nodeSelector:
agentpool: "{{ agent_pool }}"
replicas: 4
resources:
limits:
memory: "16Gi"
env:
- name: GOMEMLIMIT
value: 14000MiB
distributor:
nodeSelector:
agentpool: "{{ agent_pool }}"
replicas: 4
resources:
limits:
memory: "8Gi"
env:
- name: GOMEMLIMIT
value: 7000MiB
alertmanager:
enabled: false
ruler:
enabled: false
query_frontend:
nodeSelector:
agentpool: "{{ agent_pool }}"
querier:
nodeSelector:
agentpool: "{{ agent_pool }}"
query_frontend:
nodeSelector:
agentpool: "{{ agent_pool }}"
nginx:
nodeSelector:
agentpool: "{{ agent_pool }}"
store_gateway:
nodeSelector:
agentpool: "{{ agent_pool }}"
compactor:
nodeSelector:
agentpool: "{{ agent_pool }}"
config:
limits:
max_label_names_per_series: 50
max_series_per_metric: 0
auth_enabled: true
memberlist:
abort_if_cluster_join_fails: false
join_members:
- cortex-memberlist.cortex.svc.cluster.local
querier:
store_gateway_addresses: cortex-store-gateway-headless.cortex.svc.cluster.local:9095
blocks_storage:
backend: azure
azure:
account_name: "{{ storage_account_name }}"
account_key: "{{ storage_account_keys_output.0.value }}"
container_name: "cortex"
endpoint_suffix: "blob.core.windows.net"
tsdb:
dir: /data/tsdb
bucket_store:
sync_dir: /data/tsdb
bucket_index:
enabled: true
ruler_storage:
azure:
account_name: "{{ storage_account_name }}"
account_key: "{{ storage_account_keys_output.0.value }}"
container_name: "cortex"
endpoint_suffix: "blob.core.windows.net"
alertmanager_storage:
azure:
account_name: "{{ storage_account_name }}"
account_key: "{{ storage_account_keys_output.0.value }}"
container_name: "cortex"
endpoint_suffix: "blob.core.windows.net"
things I had issues with and had to tweak were the memory limits and GOMEMLIMIT sizing for ingester and distributor pods, auto DNS detection for memberlist was not working so fqdn had to be specified and we also had to set some limits for max labels and max series