Giedrius Statkevičius
Giedrius Statkevičius
What's your replication factor? In practice, it should be at least 3 to allow for downtime.
Hi, unfortunately nothing can be done here. Replication factor of 1 doesn't allow any downtime and if Prometheus cannot send metrics then it retries hence increasing memory usage of Receive....
What files are in the wal directory ? It sounds like someone deleted some files from the wal directory.
That's true but I'd like to solve this problem at the RPC level once and for all so that all and future hot RPC paths would be protected from this...
Added path to the hash here https://github.com/thanos-io/thanos/pull/7158 so this should help given that you have a separate file for specifying the bucket cache configuration. Long-term fix would be to add...
Is this the same with the newest `main` version? Could you please try it? 0.31.0 is old :/
I spotted this in prod. Looking into it :eye:
Do you have this option enabled https://github.com/thanos-io/thanos/blob/main/cmd/thanos/query.go#L212? It should solve your issue.
Yeah, this optimization is something that needs to be done on Prometheus side :/ I think this is the hot path: https://github.com/prometheus/prometheus/blob/main/tsdb/head.go#L1543-L1554 Some improvements that could be made IMHO: https://github.com/prometheus/prometheus/pull/13642...
Is `thanos_compact_iterations_total` more than 0? :thinking: