kube-thanos icon indicating copy to clipboard operation
kube-thanos copied to clipboard

Full example of kube-thanos and kube-prometheus

Open GTB3NW opened this issue 4 years ago • 10 comments

Hey! I've spent a good part of today combining kube-thanos and kube-prometheus into one jsonnet file. I think the process could have been a lot smoother with examples (Like a full combo of the two mixed together) and if some defaults were set to mesh a bit more nicely with kube-prometheus (such as the namespace).

I could potentially throw my config at you the developers if I have opportunity to remove confidential information. Would that be of use?

GTB3NW avatar Jun 18 '20 19:06 GTB3NW

Very nice that you combined both projects and that's exactly what we do. Honestly, I am not totally sure how valuable that would be to most people. I can see that it might get people started quicker, I'm sure everybody still has a lot of small things to tweak.

In general, what took you the longest in that process?

metalmatze avatar Jun 22 '20 15:06 metalmatze

Very nice that you combined both projects and that's exactly what we do.

Yay! Always nice to see I don't make bad decisions haha

Honestly, I am not totally sure how valuable that would be to most people.

You answered it, it will get people started quicker. I wouldn't say the default generated manifests should include any kube-prom but it would be certainly useful to have examples available. Yeah people will want to tweak things for sure. Having less of a hurdle to do that means people get more time to experiment because the boilerplate is out of the way.

In general, what took you the longest in that process?

It was a combination of multiple things, which I think says more about the ecosystem which kube-thanos is dealing with. I'll list them all despite them all not being related to the OP.

  • Namespace wasn't the same as prometheus monitoring. Very minor but could cause hiccups for those coming from that community. I wouldn't even suggest changing it. It would just be ideal to abstract over it in an example and perhaps use a variable.
  • Cluster DNS was wrong and I missed that, but it doesn't exactly appear as an obvious change when viewing the example. Your example actually works for 90% of people but I thought I'd be thorough and mention it haha
  • No dashboards. I had to pull in dashboards from thanos upstream (Had to do a PR too to fix an issue with integration) to get any useful data, however those don't display most bits of data. I'd be happy to send over that section of my config so you can see what I'm doing there.

I've had a few 3am'ers this week already so I've practically forgotten most of the setup hiccups I went through sorry. But those are what I remember :)

GTB3NW avatar Jun 23 '20 07:06 GTB3NW

Nice. Thank you for posting that earlier.

We had some discussions on the CNCF Slack, and it seems that we generally really want to more tightly integrate these, as you proposed. Now @brancz actually proposed to add the Thanos Querier as general entrypoint to kube-prometheus and with that we can pretty add combination of Thanos + Prometheus to kube-prometheus. :) It's probably best to start like this:

  • Add the Thanos Querier as general entrypoint
  • Add an optional mixin, that can be merged on top of the stack so that we add the Sidecar (it might actually be supported already)
  • Add an optional mixin, that can be merged on top of the stack so that we add the Store

and then we can see what makes sense next :) Thank you for raising this! :100:

metalmatze avatar Jun 25 '20 08:06 metalmatze

You're welcome, it's the least I can do. I'm just glad it's got a conversation going!

Kube-prom does indeed already include a mixin to add the thanos sidecar. I thought originally it deployed the whole stack but I had to do some research and looked at a deep-dive to get a better understanding :) Would you like me to close this?

GTB3NW avatar Jun 28 '20 14:06 GTB3NW

Hey @GTB3NW,

Do you have an example of your configuration? I'm trying to get this running on our side using the same setup (kube-prometheus, kube-thanos) & I'm running into a handful of issues. thanos-rule-% won't start, and when I'm looking at logs, I just see a whole bunch of DNS failures, wondering if it's related to the issues you ran into.

level=error ts=2020-08-24T11:48:39.568115165Z caller=main.go:212 err="lookup SRV records \"_http._tcp.thanos-query.thanos.svc.cluster.local\": lookup _http._tcp.thanos-query.thanos.svc.cluster.local on 10.76.7.52:53: no such host\nrule command failed\nmain.main\n\t/go/src/github.com/thanos-io/thanos/cmd/thanos/main.go:212\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1373"

dominicgunn avatar Aug 24 '20 11:08 dominicgunn

Hey @GTB3NW,

Do you have an example of your configuration? I'm trying to get this running on our side using the same setup (kube-prometheus, kube-thanos) & I'm running into a handful of issues. thanos-rule-% won't start, and when I'm looking at logs, I just see a whole bunch of DNS failures, wondering if it's related to the issues you ran into.

level=error ts=2020-08-24T11:48:39.568115165Z caller=main.go:212 err="lookup SRV records \"_http._tcp.thanos-query.thanos.svc.cluster.local\": lookup _http._tcp.thanos-query.thanos.svc.cluster.local on 10.76.7.52:53: no such host\nrule command failed\nmain.main\n\t/go/src/github.com/thanos-io/thanos/cmd/thanos/main.go:212\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1373"

Hey sorry only just seen this. I'll see if I can strip back an example for you tomorrow if I get time. No promises but I'll see what I can do.

GTB3NW avatar Aug 27 '20 22:08 GTB3NW

Any luck on this @GTB3NW?

dominicgunn avatar Sep 02 '20 12:09 dominicgunn

Any luck on this @GTB3NW?

Hey really sorry I didn't get back to you!

I've given it like 5 skims and haven't found anything confidential or any domains or anything, but if you spot something I may want to remove gimme a poke :) :+1:

If you look at the tq variable, you'll see there's a string you'll need to modify with your cluster DNS in there. I couldn't think of a better way to do it TBH so you'll have to stick with that. Just modify cluster.local with whatever you named your cluster.

local k = import 'ksonnet/ksonnet.beta.4/k.libsonnet';
local sts = k.apps.v1.statefulSet;
local deployment = k.apps.v1.deployment;
local t = (import 'kube-thanos/thanos.libsonnet');

local thanosDashboards =
  (
    import 'mixin/mixin.libsonnet'
  ).grafanaDashboards;

local kp =
  (import 'kube-prometheus/kube-prometheus.libsonnet') +
  (import 'kube-prometheus/kube-prometheus-kubespray.libsonnet') +
  (import 'kube-prometheus/kube-prometheus-thanos-sidecar.libsonnet') +
  (import 'kube-prometheus/kube-prometheus-custom-metrics.libsonnet') +
  {
    _config+:: {
      namespace: 'monitoring',
      versions+:: {
          grafana: "7.1.1",
      },
      grafana+:: {
        datasources: [
          {
            name: 'prometheus',
            type: 'prometheus',
            access: 'proxy',
            orgId: 1,
            url: 'http://thanos-query.' + $._config.namespace + '.svc:9090',
            version: 1,
            editable: false,
          },
          {
            name: 'prometheus_original',
            type: 'prometheus',
            access: 'proxy',
            orgId: 1,
            url: 'http://' + $._config.prometheus.serviceName + '.' + $._config.namespace + '.svc:9090',
            version: 1,
            editable: false,
          }
        ],
        config+: {
          sections+: {
            server+: {
              root_url: std.extVar('grafana_url'),
            },
          },
        },
      },
    },
    // Configure External URL's per application
    alertmanager+:: {
      alertmanager+: {
        spec+: {
          externalUrl: std.extVar('alertmanager_url'),
        },
      },
    },
    grafanaDashboards+:: {
    } + {
      [name]: thanosDashboards[name] for name in std.objectFields(thanosDashboards)
    },
    prometheus+:: {
      prometheus+: {
        spec+: {
          externalUrl: std.extVar('prometheus_url'),
        },
      },
    },
    
  };

local kt = {
  config+:: {
    local cfg = self,
    namespace: 'monitoring',
    version: 'v0.13.0-rc.0',
    image: 'quay.io/thanos/thanos:' + cfg.version,
    objectStorageConfig: {
      name: 'thanos-objectstorage',
      key: 'thanos.yaml',
    },
    volumeClaimTemplate: {
      spec: {
        accessModes: ['ReadWriteOnce'],
        storageClassName: "thanos-block",
        resources: {
          requests: {
            storage: '10Gi',
          },
        },
      },
    },
  },
};

local ts = t.store + t.store.withVolumeClaimTemplate + t.store.withServiceMonitor + kt + {
  config+:: {
    name: 'thanos-store',
    replicas: 1,
  },
};

local tq = t.query + t.query.withServiceMonitor + kt + {
  config+:: {
    name: 'thanos-query',
    replicas: 1,
    stores: [
      'dnssrv+_grpc._tcp.%s.%s.svc.cluster.local' % [service.metadata.name, service.metadata.namespace]
      for service in [ts.service]
    ] + [
      kp._config.prometheus.serviceName + '.' + kp._config.namespace + '.svc:10901',
    ],
    replicaLabels: ['prometheus_replica', 'rule_replica'],
  },
};

local manifests =
  { ['setup/0namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
  {
    ['setup/prometheus-operator-' + name]: kp.prometheusOperator[name]
    for name in std.filter((function(name) name != 'serviceMonitor'), std.objectFields(kp.prometheusOperator))
  } +
  // serviceMonitor is separated so that it can be created after the CRDs are ready
  { 'prometheus-operator-serviceMonitor': kp.prometheusOperator.serviceMonitor } +
  { ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
  { ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
  { ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
  { ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
  { ['prometheus-adapter-' + name]: kp.prometheusAdapter[name] for name in std.objectFields(kp.prometheusAdapter) } +
  { ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) } +
  { ['thanos-store-' + name]: ts[name] for name in std.objectFields(ts) } +
  { ['thanos-query-' + name]: tq[name] for name in std.objectFields(tq) };

local kustomizationResourceFile(name) = './manifests/' + name + '.yaml';
local kustomization = {
  apiVersion: 'kustomize.config.k8s.io/v1beta1',
  kind: 'Kustomization',
  resources: std.map(kustomizationResourceFile, std.objectFields(manifests)),
};

manifests {
  '../kustomization': kustomization,
}

This is a stripped down version of our config, so I apologise if it doesn't compile but it shouldn't be too hard to get working. There's some variables passed in via the CLI you'll need to either pass in or remove (I use it for different environment URL's).

You'll notice that I add the old prometheus source to grafana just in case, you can easily just remove this (prometheus_original) and it wouldn't make a difference.

Ohh one last note, it's all deployed to the same namespace, if you want to change that you'll probably need a to edit a fair few places.

Good luck!

GTB3NW avatar Sep 08 '20 11:09 GTB3NW

Any luck on this @GTB3NW?

Hey really sorry I didn't get back to you!

I've given it like 5 skims and haven't found anything confidential or any domains or anything, but if you spot something I may want to remove gimme a poke :)

If you look at the tq variable, you'll see there's a string you'll need to modify with your cluster DNS in there. I couldn't think of a better way to do it TBH so you'll have to stick with that. Just modify cluster.local with whatever you named your cluster.

local k = import 'ksonnet/ksonnet.beta.4/k.libsonnet';
local sts = k.apps.v1.statefulSet;
local deployment = k.apps.v1.deployment;
local t = (import 'kube-thanos/thanos.libsonnet');

local thanosDashboards =
  (
    import 'mixin/mixin.libsonnet'
  ).grafanaDashboards;

local kp =
  (import 'kube-prometheus/kube-prometheus.libsonnet') +
  (import 'kube-prometheus/kube-prometheus-kubespray.libsonnet') +
  (import 'kube-prometheus/kube-prometheus-thanos-sidecar.libsonnet') +
  (import 'kube-prometheus/kube-prometheus-custom-metrics.libsonnet') +
  {
    _config+:: {
      namespace: 'monitoring',
      versions+:: {
          grafana: "7.1.1",
      },
      grafana+:: {
        datasources: [
          {
            name: 'prometheus',
            type: 'prometheus',
            access: 'proxy',
            orgId: 1,
            url: 'http://thanos-query.' + $._config.namespace + '.svc:9090',
            version: 1,
            editable: false,
          },
          {
            name: 'prometheus_original',
            type: 'prometheus',
            access: 'proxy',
            orgId: 1,
            url: 'http://' + $._config.prometheus.serviceName + '.' + $._config.namespace + '.svc:9090',
            version: 1,
            editable: false,
          }
        ],
        config+: {
          sections+: {
            server+: {
              root_url: std.extVar('grafana_url'),
            },
          },
        },
      },
    },
    // Configure External URL's per application
    alertmanager+:: {
      alertmanager+: {
        spec+: {
          externalUrl: std.extVar('alertmanager_url'),
        },
      },
    },
    grafanaDashboards+:: {
    } + {
      [name]: thanosDashboards[name] for name in std.objectFields(thanosDashboards)
    },
    prometheus+:: {
      prometheus+: {
        spec+: {
          externalUrl: std.extVar('prometheus_url'),
        },
      },
    },
    
  };

local kt = {
  config+:: {
    local cfg = self,
    namespace: 'monitoring',
    version: 'v0.13.0-rc.0',
    image: 'quay.io/thanos/thanos:' + cfg.version,
    objectStorageConfig: {
      name: 'thanos-objectstorage',
      key: 'thanos.yaml',
    },
    volumeClaimTemplate: {
      spec: {
        accessModes: ['ReadWriteOnce'],
        storageClassName: "thanos-block",
        resources: {
          requests: {
            storage: '10Gi',
          },
        },
      },
    },
  },
};

local ts = t.store + t.store.withVolumeClaimTemplate + t.store.withServiceMonitor + kt + {
  config+:: {
    name: 'thanos-store',
    replicas: 1,
  },
};

local tq = t.query + t.query.withServiceMonitor + kt + {
  config+:: {
    name: 'thanos-query',
    replicas: 1,
    stores: [
      'dnssrv+_grpc._tcp.%s.%s.svc.cluster.local' % [service.metadata.name, service.metadata.namespace]
      for service in [ts.service]
    ] + [
      kp._config.prometheus.serviceName + '.' + kp._config.namespace + '.svc:10901',
    ],
    replicaLabels: ['prometheus_replica', 'rule_replica'],
  },
};

local manifests =
  { ['setup/0namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
  {
    ['setup/prometheus-operator-' + name]: kp.prometheusOperator[name]
    for name in std.filter((function(name) name != 'serviceMonitor'), std.objectFields(kp.prometheusOperator))
  } +
  // serviceMonitor is separated so that it can be created after the CRDs are ready
  { 'prometheus-operator-serviceMonitor': kp.prometheusOperator.serviceMonitor } +
  { ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
  { ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
  { ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
  { ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
  { ['prometheus-adapter-' + name]: kp.prometheusAdapter[name] for name in std.objectFields(kp.prometheusAdapter) } +
  { ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) } +
  { ['thanos-store-' + name]: ts[name] for name in std.objectFields(ts) } +
  { ['thanos-query-' + name]: tq[name] for name in std.objectFields(tq) };

local kustomizationResourceFile(name) = './manifests/' + name + '.yaml';
local kustomization = {
  apiVersion: 'kustomize.config.k8s.io/v1beta1',
  kind: 'Kustomization',
  resources: std.map(kustomizationResourceFile, std.objectFields(manifests)),
};

manifests {
  '../kustomization': kustomization,
}

This is a stripped down version of our config, so I apologise if it doesn't compile but it shouldn't be too hard to get working. There's some variables passed in via the CLI you'll need to either pass in or remove (I use it for different environment URL's).

You'll notice that I add the old prometheus source to grafana just in case, you can easily just remove this (prometheus_original) and it wouldn't make a difference.

Ohh one last note, it's all deployed to the same namespace, if you want to change that you'll probably need a to edit a fair few places.

Good luck!

Thank you very much! this example was helpful to help us set thanos with the kube-prometheus stack :)

stradigi-eric-hammel avatar Oct 07 '20 17:10 stradigi-eric-hammel

This is a stripped down version of how I've integrated kube-thanos and kube-prometheus. I use this mainly to test things out locally by making changes and applying the build.sh and hoping my manifests generate as I'm fairly new to jsonnet.

We actually use ArgoCD to deploy our kube-prom stack so the build-snippet down the very bottom would look completely different to below and doesn't compile with the build.sh (ArgoCD needs the jsonnet to generate an array of objects instead of a map to pick up and sync directly to cluster).

Only deployed the sidecar and compact for the moment and no issues so far. Working through the other thanos components as I speak.

local secret(name, type, stringData) = {
  apiVersion: 'v1',
  kind: 'Secret',
  metadata: {
    name: name,
  },
  type: type,
  stringData: stringData, 
};

local t = import 'kube-thanos/thanos.libsonnet';

local kp =
  (import 'kube-prometheus/main.libsonnet') +
  (import 'kube-prometheus/addons/anti-affinity.libsonnet') +
  (import 'kube-prometheus/addons/managed-cluster.libsonnet') +
  (import 'kube-prometheus/addons/all-namespaces.libsonnet') +
  {
    values+:: {
      common+: {
        namespace: 'monitoring',
      },
      prometheus+: {
        namespaces+: [],
        thanos: { //thanos-sidecar
          version: '0.21.0',
          image: 'quay.io/thanos/thanos:v0.21.0',
          objectStorageConfig: {
            key: 'thanos.yaml',  //how file inside secret is called
            name: 'thanos-objectstorage',  //name of K8s secret within the config
          },
        },
      },
    },
    // -------------------
    // End of values config
    // --------------------
    // add prometheus data persistance
    prometheus+:: {
      prometheus+: {
        spec+: {
          // If retention not specified, default will be '--storage.tsdb.retention=24h' passed to prometheus by prometheus-operator.
          retention: '30d',
          retentionSize: '8GB',
          walCompression: true,
          storage: {
            volumeClaimTemplate: {
              apiVersion: 'v1',
              kind: 'PersistentVolumeClaim',
              spec: {
                accessModes: ['ReadWriteOnce'],
                resources: { requests: { storage: '10Gi' } },
                storageClassName: 'standard-encrypted',
              },
            },
          },
        },
      },
      // override Prometheus svcAcct to add annotation to link cloud svcacct to prometheus k8sSvcAcct
      serviceAccount+: {
        metadata+: {
          annotations+: {
            'iam.gke.io/gcp-service-account': '[email protected]',
          },
        },
      },
      // create secret for thanos deployment
      thanosSecret: secret(
        'thanos-objectstorage', 
        'opaque', 
        {
          'thanos.yaml': 'your bucket config',
        },
      ),
    },
  };
//thanos-compact
local c = t.compact(kp.values.common + kp.values.prometheus.thanos { //use namespace, version, image and objectStorageConfig
  replicas: 1,
  serviceMonitor: true,
  resources: { requests: { cpu: '250m', memory: '1Gi' }, limits: { cpu: 1, memory: '1Gi' } },
  volumeClaimTemplate: {
    spec: {
      accessModes: ['ReadWriteOnce'],
      storageClassName: 'standard-encrypted',
      resources: {
        requests: {
          storage: '10Gi',
        },
      },
    },
  },
}) + {
  serviceAccount+: { // override thanos-compact svcAcct to add annotation to link cloud svcacct to thanos-compact K8sSvcAcct
    metadata+: {
      annotations+: {
        'iam.gke.io/gcp-service-account': '[email protected]',
      },
    },
  },
};

// extra config stripped away for brevity
local s = t.store(kp.values.common + kp.values.prometheus.thanos {
  replicas: 1,
  serviceMonitor: true,
});

local q = t.query(kp.values.common + kp.values.prometheus.thanos {
  replicas: 1,
  replicaLabels: ['prometheus_replica', 'rule_replica'],
  serviceMonitor: true,
});


{ 'setup/0namespace-namespace': kp.kubePrometheus.namespace } +
{
  ['setup/prometheus-operator-' + name]: kp.prometheusOperator[name]
  for name in std.filter((function(name) name != 'serviceMonitor' && name != 'prometheusRule'), std.objectFields(kp.prometheusOperator))
} +
// serviceMonitor and prometheusRule are separated so that they can be created after the CRDs are ready
{ 'prometheus-operator-serviceMonitor': kp.prometheusOperator.serviceMonitor } +
{ 'prometheus-operator-prometheusRule': kp.prometheusOperator.prometheusRule } +
{ 'kube-prometheus-prometheusRule': kp.kubePrometheus.prometheusRule } +
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['blackbox-exporter-' + name]: kp.blackboxExporter[name] for name in std.objectFields(kp.blackboxExporter) } +
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) } +
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['kubernetes-' + name]: kp.kubernetesControlPlane[name] for name in std.objectFields(kp.kubernetesControlPlane) }
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['prometheus-adapter-' + name]: kp.prometheusAdapter[name] for name in std.objectFields(kp.prometheusAdapter) } +
{ ['thanos-compact-' + name]: c[name] for name in std.objectFields(c) if c[name] != null } +
{ ['thanos-store-' + name]: s[name] for name in std.objectFields(s) } +
{ ['thanos-query-' + name]: q[name] for name in std.objectFields(q) }

AlHood77 avatar Aug 26 '21 11:08 AlHood77

fuck! i will use victoriametrics, i will give up thanos!

chinaboy007 avatar Mar 03 '23 10:03 chinaboy007