cortex-helm-chart icon indicating copy to clipboard operation
cortex-helm-chart copied to clipboard

Create a production ready values files

Open friedrichg opened this issue 1 year ago • 4 comments

Most users using the helm chart have to specify a few things in the values.yaml to get their cortex to production ready status. We should create an alternate values file to make this path easier, maybe copying some of the values specified in cortex-jsonnet.

friedrichg avatar Jul 20 '23 17:07 friedrichg

@friedrichg possibly you can start by providing what you believe is a good values.yaml so others can evaluate.

jessequinn avatar Jan 10 '24 13:01 jessequinn

Chiming in to say this would be incredibly helpful, especially for folks new to Kubernetes and Helm

abctaylor avatar Feb 25 '24 20:02 abctaylor

As a new user working through the helm install I fully agree, there is quite a bit missing to get this installed and production ready. I'm currently working through issues with my distributor and ingester pods OOM'ing and it looks like it's because there are no memory limits set on the pods and also GOMEMLIMIT is not configured.

If I'm able to get cortex running correctly I will be sure to share my values here.

danfinn avatar Apr 01 '24 18:04 danfinn

here is the ansible task we are using to do the helm install of cortex, values are specified here:

- name: Helm Cortex for Prometheus
  kubernetes.core.helm:
    name: cortex
    binary_path: "{{ helm310_binary_path }}"
    kubeconfig: "{{ context_file }}"
    context: "{{ aks_name }}"
    chart_ref: cortex-helm/cortex
    chart_version: v2.2.0
    wait: true
    wait_timeout: 600s
    release_namespace: "{{ namespace }}"
    values:
      ingress:
        enabled: true
        ingressClass:
          name: "nginx"
        annotations:
          cert-manager.io/cluster-issuer: "{{ namespace }}-letsencrypt-issuer"
          kubernetes.io/ingress.class: internal-nginx
        hosts:
          - host: "cortex.{{ dns_zone }}"
            paths:
              - /
        tls:
          - hosts:
              - "cortex.{{ dns_zone }}"
            secretName: cortex-tls
      ingester:
        nodeSelector:
          agentpool: "{{ agent_pool }}"
        replicas: 4
        resources:
          limits:
            memory: "16Gi"
        env:
          - name: GOMEMLIMIT
            value: 14000MiB
      distributor:
        nodeSelector:
          agentpool: "{{ agent_pool }}"
        replicas: 4
        resources:
          limits:
            memory: "8Gi"
        env:
          - name: GOMEMLIMIT
            value: 7000MiB
      alertmanager:
        enabled: false
      ruler:
        enabled: false
      query_frontend:
        nodeSelector:
          agentpool: "{{ agent_pool }}"
      querier:
        nodeSelector:
          agentpool: "{{ agent_pool }}"
      query_frontend:
        nodeSelector:
          agentpool: "{{ agent_pool }}"
      nginx:
        nodeSelector:
          agentpool: "{{ agent_pool }}"
      store_gateway:
        nodeSelector:
          agentpool: "{{ agent_pool }}"
      compactor:
        nodeSelector:
          agentpool: "{{ agent_pool }}"
      config:
        limits:
          max_label_names_per_series: 50
          max_series_per_metric: 0
        auth_enabled: true
        memberlist:
          abort_if_cluster_join_fails: false
          join_members:
            - cortex-memberlist.cortex.svc.cluster.local
        querier:
          store_gateway_addresses: cortex-store-gateway-headless.cortex.svc.cluster.local:9095
        blocks_storage:
          backend: azure
          azure:
            account_name: "{{ storage_account_name }}"
            account_key: "{{ storage_account_keys_output.0.value }}"
            container_name: "cortex"
            endpoint_suffix: "blob.core.windows.net"
          tsdb:
            dir: /data/tsdb
          bucket_store:
            sync_dir: /data/tsdb
            bucket_index:
              enabled: true
        ruler_storage:
          azure:
            account_name: "{{ storage_account_name }}"
            account_key: "{{ storage_account_keys_output.0.value }}"
            container_name: "cortex"
            endpoint_suffix: "blob.core.windows.net"
        alertmanager_storage:
          azure:
            account_name: "{{ storage_account_name }}"
            account_key: "{{ storage_account_keys_output.0.value }}"
            container_name: "cortex"
            endpoint_suffix: "blob.core.windows.net"

things I had issues with and had to tweak were the memory limits and GOMEMLIMIT sizing for ingester and distributor pods, auto DNS detection for memberlist was not working so fqdn had to be specified and we also had to set some limits for max labels and max series

danfinn avatar Apr 17 '24 20:04 danfinn