thanos icon indicating copy to clipboard operation
thanos copied to clipboard

Sidecar is significantly slower than the underlying Prometheus queries

Open jon-rei opened this issue 1 year ago • 8 comments

Thanos, Prometheus and Golang version used: Thanos: v0.32.5 Prometheus: v.2.45.0

What happened: The Thanos sidecar is significantly slower than the actual Prometheus query when queried by the Thanos querier. We can see query times on the sidecar of up to 2 minutes, but the actual Prometheus query only takes a few seconds. In the end, this makes the whole Thanos setup very slow.

What you expected to happen: That the Thanos sidecar wouldn't be so different from the Prometheus.

How to reproduce it (as minimally and precisely as possible): Could be very environment dependent. We are trying to query a metric (container_network_receive_bytes_total) with ~26k series and ~6 million samples.

Anything else we need to know: The Thanos sidecar pushes metrics to our S3 bucket every 2 hours and we use the Querier to query the sidecar. We also use the Thanos query engine.

We set the following resources for the sidecar, but in reality the sidecar is just using a fraction of it and is not getting throttled at any time.

resources:
  limits:
    cpu: 3
    memory: 4Gi
  requests:
    cpu: 1
    memory: 512Mi

Traces: thanos-sidecar-slow thanos-sidecar-slow-2

I've found several other issues (#4304, #631) which are unfortunately closed without any helpful resolutions.

jon-rei avatar Nov 27 '23 09:11 jon-rei