kube-thanos
kube-thanos copied to clipboard
Full example of kube-thanos and kube-prometheus
Hey! I've spent a good part of today combining kube-thanos and kube-prometheus into one jsonnet file. I think the process could have been a lot smoother with examples (Like a full combo of the two mixed together) and if some defaults were set to mesh a bit more nicely with kube-prometheus (such as the namespace).
I could potentially throw my config at you the developers if I have opportunity to remove confidential information. Would that be of use?
Very nice that you combined both projects and that's exactly what we do. Honestly, I am not totally sure how valuable that would be to most people. I can see that it might get people started quicker, I'm sure everybody still has a lot of small things to tweak.
In general, what took you the longest in that process?
Very nice that you combined both projects and that's exactly what we do.
Yay! Always nice to see I don't make bad decisions haha
Honestly, I am not totally sure how valuable that would be to most people.
You answered it, it will get people started quicker. I wouldn't say the default generated manifests should include any kube-prom but it would be certainly useful to have examples available. Yeah people will want to tweak things for sure. Having less of a hurdle to do that means people get more time to experiment because the boilerplate is out of the way.
In general, what took you the longest in that process?
It was a combination of multiple things, which I think says more about the ecosystem which kube-thanos is dealing with. I'll list them all despite them all not being related to the OP.
- Namespace wasn't the same as prometheus
monitoring
. Very minor but could cause hiccups for those coming from that community. I wouldn't even suggest changing it. It would just be ideal to abstract over it in an example and perhaps use a variable. - Cluster DNS was wrong and I missed that, but it doesn't exactly appear as an obvious change when viewing the example. Your example actually works for 90% of people but I thought I'd be thorough and mention it haha
- No dashboards. I had to pull in dashboards from thanos upstream (Had to do a PR too to fix an issue with integration) to get any useful data, however those don't display most bits of data. I'd be happy to send over that section of my config so you can see what I'm doing there.
I've had a few 3am'ers this week already so I've practically forgotten most of the setup hiccups I went through sorry. But those are what I remember :)
Nice. Thank you for posting that earlier.
We had some discussions on the CNCF Slack, and it seems that we generally really want to more tightly integrate these, as you proposed. Now @brancz actually proposed to add the Thanos Querier as general entrypoint to kube-prometheus and with that we can pretty add combination of Thanos + Prometheus to kube-prometheus. :) It's probably best to start like this:
- Add the Thanos Querier as general entrypoint
- Add an optional mixin, that can be merged on top of the stack so that we add the Sidecar (it might actually be supported already)
- Add an optional mixin, that can be merged on top of the stack so that we add the Store
and then we can see what makes sense next :) Thank you for raising this! :100:
You're welcome, it's the least I can do. I'm just glad it's got a conversation going!
Kube-prom does indeed already include a mixin to add the thanos sidecar. I thought originally it deployed the whole stack but I had to do some research and looked at a deep-dive to get a better understanding :) Would you like me to close this?
Hey @GTB3NW,
Do you have an example of your configuration? I'm trying to get this running on our side using the same setup (kube-prometheus, kube-thanos) & I'm running into a handful of issues. thanos-rule-%
won't start, and when I'm looking at logs, I just see a whole bunch of DNS failures, wondering if it's related to the issues you ran into.
level=error ts=2020-08-24T11:48:39.568115165Z caller=main.go:212 err="lookup SRV records \"_http._tcp.thanos-query.thanos.svc.cluster.local\": lookup _http._tcp.thanos-query.thanos.svc.cluster.local on 10.76.7.52:53: no such host\nrule command failed\nmain.main\n\t/go/src/github.com/thanos-io/thanos/cmd/thanos/main.go:212\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1373"
Hey @GTB3NW,
Do you have an example of your configuration? I'm trying to get this running on our side using the same setup (kube-prometheus, kube-thanos) & I'm running into a handful of issues.
thanos-rule-%
won't start, and when I'm looking at logs, I just see a whole bunch of DNS failures, wondering if it's related to the issues you ran into.level=error ts=2020-08-24T11:48:39.568115165Z caller=main.go:212 err="lookup SRV records \"_http._tcp.thanos-query.thanos.svc.cluster.local\": lookup _http._tcp.thanos-query.thanos.svc.cluster.local on 10.76.7.52:53: no such host\nrule command failed\nmain.main\n\t/go/src/github.com/thanos-io/thanos/cmd/thanos/main.go:212\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1373"
Hey sorry only just seen this. I'll see if I can strip back an example for you tomorrow if I get time. No promises but I'll see what I can do.
Any luck on this @GTB3NW?
Any luck on this @GTB3NW?
Hey really sorry I didn't get back to you!
I've given it like 5 skims and haven't found anything confidential or any domains or anything, but if you spot something I may want to remove gimme a poke :) :+1:
If you look at the tq variable, you'll see there's a string you'll need to modify with your cluster DNS in there. I couldn't think of a better way to do it TBH so you'll have to stick with that. Just modify cluster.local
with whatever you named your cluster.
local k = import 'ksonnet/ksonnet.beta.4/k.libsonnet';
local sts = k.apps.v1.statefulSet;
local deployment = k.apps.v1.deployment;
local t = (import 'kube-thanos/thanos.libsonnet');
local thanosDashboards =
(
import 'mixin/mixin.libsonnet'
).grafanaDashboards;
local kp =
(import 'kube-prometheus/kube-prometheus.libsonnet') +
(import 'kube-prometheus/kube-prometheus-kubespray.libsonnet') +
(import 'kube-prometheus/kube-prometheus-thanos-sidecar.libsonnet') +
(import 'kube-prometheus/kube-prometheus-custom-metrics.libsonnet') +
{
_config+:: {
namespace: 'monitoring',
versions+:: {
grafana: "7.1.1",
},
grafana+:: {
datasources: [
{
name: 'prometheus',
type: 'prometheus',
access: 'proxy',
orgId: 1,
url: 'http://thanos-query.' + $._config.namespace + '.svc:9090',
version: 1,
editable: false,
},
{
name: 'prometheus_original',
type: 'prometheus',
access: 'proxy',
orgId: 1,
url: 'http://' + $._config.prometheus.serviceName + '.' + $._config.namespace + '.svc:9090',
version: 1,
editable: false,
}
],
config+: {
sections+: {
server+: {
root_url: std.extVar('grafana_url'),
},
},
},
},
},
// Configure External URL's per application
alertmanager+:: {
alertmanager+: {
spec+: {
externalUrl: std.extVar('alertmanager_url'),
},
},
},
grafanaDashboards+:: {
} + {
[name]: thanosDashboards[name] for name in std.objectFields(thanosDashboards)
},
prometheus+:: {
prometheus+: {
spec+: {
externalUrl: std.extVar('prometheus_url'),
},
},
},
};
local kt = {
config+:: {
local cfg = self,
namespace: 'monitoring',
version: 'v0.13.0-rc.0',
image: 'quay.io/thanos/thanos:' + cfg.version,
objectStorageConfig: {
name: 'thanos-objectstorage',
key: 'thanos.yaml',
},
volumeClaimTemplate: {
spec: {
accessModes: ['ReadWriteOnce'],
storageClassName: "thanos-block",
resources: {
requests: {
storage: '10Gi',
},
},
},
},
},
};
local ts = t.store + t.store.withVolumeClaimTemplate + t.store.withServiceMonitor + kt + {
config+:: {
name: 'thanos-store',
replicas: 1,
},
};
local tq = t.query + t.query.withServiceMonitor + kt + {
config+:: {
name: 'thanos-query',
replicas: 1,
stores: [
'dnssrv+_grpc._tcp.%s.%s.svc.cluster.local' % [service.metadata.name, service.metadata.namespace]
for service in [ts.service]
] + [
kp._config.prometheus.serviceName + '.' + kp._config.namespace + '.svc:10901',
],
replicaLabels: ['prometheus_replica', 'rule_replica'],
},
};
local manifests =
{ ['setup/0namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
{
['setup/prometheus-operator-' + name]: kp.prometheusOperator[name]
for name in std.filter((function(name) name != 'serviceMonitor'), std.objectFields(kp.prometheusOperator))
} +
// serviceMonitor is separated so that it can be created after the CRDs are ready
{ 'prometheus-operator-serviceMonitor': kp.prometheusOperator.serviceMonitor } +
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['prometheus-adapter-' + name]: kp.prometheusAdapter[name] for name in std.objectFields(kp.prometheusAdapter) } +
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) } +
{ ['thanos-store-' + name]: ts[name] for name in std.objectFields(ts) } +
{ ['thanos-query-' + name]: tq[name] for name in std.objectFields(tq) };
local kustomizationResourceFile(name) = './manifests/' + name + '.yaml';
local kustomization = {
apiVersion: 'kustomize.config.k8s.io/v1beta1',
kind: 'Kustomization',
resources: std.map(kustomizationResourceFile, std.objectFields(manifests)),
};
manifests {
'../kustomization': kustomization,
}
This is a stripped down version of our config, so I apologise if it doesn't compile but it shouldn't be too hard to get working. There's some variables passed in via the CLI you'll need to either pass in or remove (I use it for different environment URL's).
You'll notice that I add the old prometheus source to grafana just in case, you can easily just remove this (prometheus_original) and it wouldn't make a difference.
Ohh one last note, it's all deployed to the same namespace, if you want to change that you'll probably need a to edit a fair few places.
Good luck!
Any luck on this @GTB3NW?
Hey really sorry I didn't get back to you!
I've given it like 5 skims and haven't found anything confidential or any domains or anything, but if you spot something I may want to remove gimme a poke :)
If you look at the tq variable, you'll see there's a string you'll need to modify with your cluster DNS in there. I couldn't think of a better way to do it TBH so you'll have to stick with that. Just modify
cluster.local
with whatever you named your cluster.local k = import 'ksonnet/ksonnet.beta.4/k.libsonnet'; local sts = k.apps.v1.statefulSet; local deployment = k.apps.v1.deployment; local t = (import 'kube-thanos/thanos.libsonnet'); local thanosDashboards = ( import 'mixin/mixin.libsonnet' ).grafanaDashboards; local kp = (import 'kube-prometheus/kube-prometheus.libsonnet') + (import 'kube-prometheus/kube-prometheus-kubespray.libsonnet') + (import 'kube-prometheus/kube-prometheus-thanos-sidecar.libsonnet') + (import 'kube-prometheus/kube-prometheus-custom-metrics.libsonnet') + { _config+:: { namespace: 'monitoring', versions+:: { grafana: "7.1.1", }, grafana+:: { datasources: [ { name: 'prometheus', type: 'prometheus', access: 'proxy', orgId: 1, url: 'http://thanos-query.' + $._config.namespace + '.svc:9090', version: 1, editable: false, }, { name: 'prometheus_original', type: 'prometheus', access: 'proxy', orgId: 1, url: 'http://' + $._config.prometheus.serviceName + '.' + $._config.namespace + '.svc:9090', version: 1, editable: false, } ], config+: { sections+: { server+: { root_url: std.extVar('grafana_url'), }, }, }, }, }, // Configure External URL's per application alertmanager+:: { alertmanager+: { spec+: { externalUrl: std.extVar('alertmanager_url'), }, }, }, grafanaDashboards+:: { } + { [name]: thanosDashboards[name] for name in std.objectFields(thanosDashboards) }, prometheus+:: { prometheus+: { spec+: { externalUrl: std.extVar('prometheus_url'), }, }, }, }; local kt = { config+:: { local cfg = self, namespace: 'monitoring', version: 'v0.13.0-rc.0', image: 'quay.io/thanos/thanos:' + cfg.version, objectStorageConfig: { name: 'thanos-objectstorage', key: 'thanos.yaml', }, volumeClaimTemplate: { spec: { accessModes: ['ReadWriteOnce'], storageClassName: "thanos-block", resources: { requests: { storage: '10Gi', }, }, }, }, }, }; local ts = t.store + t.store.withVolumeClaimTemplate + t.store.withServiceMonitor + kt + { config+:: { name: 'thanos-store', replicas: 1, }, }; local tq = t.query + t.query.withServiceMonitor + kt + { config+:: { name: 'thanos-query', replicas: 1, stores: [ 'dnssrv+_grpc._tcp.%s.%s.svc.cluster.local' % [service.metadata.name, service.metadata.namespace] for service in [ts.service] ] + [ kp._config.prometheus.serviceName + '.' + kp._config.namespace + '.svc:10901', ], replicaLabels: ['prometheus_replica', 'rule_replica'], }, }; local manifests = { ['setup/0namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } + { ['setup/prometheus-operator-' + name]: kp.prometheusOperator[name] for name in std.filter((function(name) name != 'serviceMonitor'), std.objectFields(kp.prometheusOperator)) } + // serviceMonitor is separated so that it can be created after the CRDs are ready { 'prometheus-operator-serviceMonitor': kp.prometheusOperator.serviceMonitor } + { ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } + { ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } + { ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } + { ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } + { ['prometheus-adapter-' + name]: kp.prometheusAdapter[name] for name in std.objectFields(kp.prometheusAdapter) } + { ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) } + { ['thanos-store-' + name]: ts[name] for name in std.objectFields(ts) } + { ['thanos-query-' + name]: tq[name] for name in std.objectFields(tq) }; local kustomizationResourceFile(name) = './manifests/' + name + '.yaml'; local kustomization = { apiVersion: 'kustomize.config.k8s.io/v1beta1', kind: 'Kustomization', resources: std.map(kustomizationResourceFile, std.objectFields(manifests)), }; manifests { '../kustomization': kustomization, }
This is a stripped down version of our config, so I apologise if it doesn't compile but it shouldn't be too hard to get working. There's some variables passed in via the CLI you'll need to either pass in or remove (I use it for different environment URL's).
You'll notice that I add the old prometheus source to grafana just in case, you can easily just remove this (prometheus_original) and it wouldn't make a difference.
Ohh one last note, it's all deployed to the same namespace, if you want to change that you'll probably need a to edit a fair few places.
Good luck!
Thank you very much! this example was helpful to help us set thanos with the kube-prometheus stack :)
This is a stripped down version of how I've integrated kube-thanos and kube-prometheus. I use this mainly to test things out locally by making changes and applying the build.sh and hoping my manifests generate as I'm fairly new to jsonnet.
We actually use ArgoCD to deploy our kube-prom stack so the build-snippet down the very bottom would look completely different to below and doesn't compile with the build.sh (ArgoCD needs the jsonnet to generate an array of objects instead of a map to pick up and sync directly to cluster).
Only deployed the sidecar and compact for the moment and no issues so far. Working through the other thanos components as I speak.
local secret(name, type, stringData) = {
apiVersion: 'v1',
kind: 'Secret',
metadata: {
name: name,
},
type: type,
stringData: stringData,
};
local t = import 'kube-thanos/thanos.libsonnet';
local kp =
(import 'kube-prometheus/main.libsonnet') +
(import 'kube-prometheus/addons/anti-affinity.libsonnet') +
(import 'kube-prometheus/addons/managed-cluster.libsonnet') +
(import 'kube-prometheus/addons/all-namespaces.libsonnet') +
{
values+:: {
common+: {
namespace: 'monitoring',
},
prometheus+: {
namespaces+: [],
thanos: { //thanos-sidecar
version: '0.21.0',
image: 'quay.io/thanos/thanos:v0.21.0',
objectStorageConfig: {
key: 'thanos.yaml', //how file inside secret is called
name: 'thanos-objectstorage', //name of K8s secret within the config
},
},
},
},
// -------------------
// End of values config
// --------------------
// add prometheus data persistance
prometheus+:: {
prometheus+: {
spec+: {
// If retention not specified, default will be '--storage.tsdb.retention=24h' passed to prometheus by prometheus-operator.
retention: '30d',
retentionSize: '8GB',
walCompression: true,
storage: {
volumeClaimTemplate: {
apiVersion: 'v1',
kind: 'PersistentVolumeClaim',
spec: {
accessModes: ['ReadWriteOnce'],
resources: { requests: { storage: '10Gi' } },
storageClassName: 'standard-encrypted',
},
},
},
},
},
// override Prometheus svcAcct to add annotation to link cloud svcacct to prometheus k8sSvcAcct
serviceAccount+: {
metadata+: {
annotations+: {
'iam.gke.io/gcp-service-account': '[email protected]',
},
},
},
// create secret for thanos deployment
thanosSecret: secret(
'thanos-objectstorage',
'opaque',
{
'thanos.yaml': 'your bucket config',
},
),
},
};
//thanos-compact
local c = t.compact(kp.values.common + kp.values.prometheus.thanos { //use namespace, version, image and objectStorageConfig
replicas: 1,
serviceMonitor: true,
resources: { requests: { cpu: '250m', memory: '1Gi' }, limits: { cpu: 1, memory: '1Gi' } },
volumeClaimTemplate: {
spec: {
accessModes: ['ReadWriteOnce'],
storageClassName: 'standard-encrypted',
resources: {
requests: {
storage: '10Gi',
},
},
},
},
}) + {
serviceAccount+: { // override thanos-compact svcAcct to add annotation to link cloud svcacct to thanos-compact K8sSvcAcct
metadata+: {
annotations+: {
'iam.gke.io/gcp-service-account': '[email protected]',
},
},
},
};
// extra config stripped away for brevity
local s = t.store(kp.values.common + kp.values.prometheus.thanos {
replicas: 1,
serviceMonitor: true,
});
local q = t.query(kp.values.common + kp.values.prometheus.thanos {
replicas: 1,
replicaLabels: ['prometheus_replica', 'rule_replica'],
serviceMonitor: true,
});
{ 'setup/0namespace-namespace': kp.kubePrometheus.namespace } +
{
['setup/prometheus-operator-' + name]: kp.prometheusOperator[name]
for name in std.filter((function(name) name != 'serviceMonitor' && name != 'prometheusRule'), std.objectFields(kp.prometheusOperator))
} +
// serviceMonitor and prometheusRule are separated so that they can be created after the CRDs are ready
{ 'prometheus-operator-serviceMonitor': kp.prometheusOperator.serviceMonitor } +
{ 'prometheus-operator-prometheusRule': kp.prometheusOperator.prometheusRule } +
{ 'kube-prometheus-prometheusRule': kp.kubePrometheus.prometheusRule } +
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['blackbox-exporter-' + name]: kp.blackboxExporter[name] for name in std.objectFields(kp.blackboxExporter) } +
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) } +
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['kubernetes-' + name]: kp.kubernetesControlPlane[name] for name in std.objectFields(kp.kubernetesControlPlane) }
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['prometheus-adapter-' + name]: kp.prometheusAdapter[name] for name in std.objectFields(kp.prometheusAdapter) } +
{ ['thanos-compact-' + name]: c[name] for name in std.objectFields(c) if c[name] != null } +
{ ['thanos-store-' + name]: s[name] for name in std.objectFields(s) } +
{ ['thanos-query-' + name]: q[name] for name in std.objectFields(q) }
fuck! i will use victoriametrics, i will give up thanos!