mimir icon indicating copy to clipboard operation
mimir copied to clipboard

Unhealthy compactor in the ring should not cause that bucket-index is not refreshed for any tenant

Open david-vavra opened this issue 2 years ago • 7 comments

Is your feature request related to a problem? Please describe.

It seems that if an instance of compactor goes unhealthy, its responsibilities (compacting as well as refreshing of bucket-index) are not taken over by remaining (healthy) compactors unless manual action is taken (Initiating Forget compactor from the ring).

Describe the solution you'd like

In setups where queriers and store-gateways are not computing bucket-index on their own, compactor is a single-point-of-failure - as queriers start to fail queries in case where bucket-index is not updated for at least 1h (value configured in -blocks-storage.bucket-store.bucket-index.max-stale-period). Optimal solution imho would be if Unhealthy compactors would not be taken into account in algorithm which defines which compactor computes bucket-index for tenants. This may lead to duplicit bucket index computation, but we don't see this as a major issue, especially when considering the impact of not having bucket-index at all.

Describe alternatives you've considered

Bucket-index computation is not the only responsibility of a compactor. The other one is vertical and horizontal compaction of blocks present in the storage. Afaik, impact of having non-compacted blocks in the storage is not crucial to cluster's operation - queries are still being evaluated, and the impact is just in degraded performance. It would of course good if also the compaction-part of compactor's responsibility would be taken over in case of a compactor going Unhealthy, however this may result to some blocks being compacted (and uploaded) more than once. I'm not sure if that would be an issue, or if one of the duplicitly compacted blocks would be just deleted in some future compactor run.

Additional context

We've experienced loss of connection with one of our datacenters (planned network-partition disaster recovery test) where one of the compactors was running. After approximately 45 minutes, queriers started to failing due to old bucket index (age of which we limit to 1h which is the default) for two of our tenants. Compactor in the disconnected DC was at that time responsible for generating the bucket-index for the two tenants which started failing queries. It seems that no other compactor took over the "orphaned" tenants.

(The issue was originally discussed at Grafana's Slack)

david-vavra avatar Dec 12 '22 19:12 david-vavra

A while back we switched the distributors from using ring.Lifecycler to ring.BasicLifecycler so we could make use of the auto-forget functionality. This allows distributors to be removed from the ring if they haven't had a successful heartbeat in some amount of time.

Compactors currently use ring.Lifecycler. Maybe it makes sense to switch them over to ring.BasicLifecycler and enable auto-forget as well?

56quarters avatar Dec 12 '22 20:12 56quarters

Compactors currently use ring.Lifecycler. Maybe it makes sense to switch them over to ring.BasicLifecycler and enable auto-forget as well?

Sounds good to me. I don't see any possible problem with doing auto-forget in compactor ring. In the worst case compactors may do some extra work, but they won't do anything wrong.

pstibrany avatar Dec 13 '22 07:12 pstibrany

We already have an issue about adding auto-forget for compactors . We agreed we want to do it, but looking for some help to do it.

pracucci avatar Dec 15 '22 08:12 pracucci

It seems that if an instance of compactor goes unhealthy, its responsibilities (compacting as well as refreshing of bucket-index) are not taken over by remaining (healthy) compactors unless manual action is taken (Initiating Forget compactor from the ring).

I think this is a bug. Auto-forget shouldn't be necessary for an healthy instance to pick up workload from an unhealthy one. However, I think what you're experiencing is a side effect of how the ring is implemented, specifically the Ring.Get(): https://github.com/grafana/dskit/blob/main/ring/ring.go#L336

The compactor ring is configured with replication factor = 1 (correct). To check if an instance owns a given compaction job (or tenant for the bucket-index update), the compactor calls Ring.Get() and checks whether the given compactor instance owns that job or tenant.

However, the Ring.Get() doesn't extend the replication set in the case of unhealthy instances (it's not even supported by the Ring client, which only allows to extend the replication set based on the state and not healthy/unhealthy). This is a property we want for ingesters ring, but not for compactors ring.

To fix this issue for real, we should use a different logic to check whether an instance owns or not a given key. I think a way to fix it would be filtering out all unhealthy instances from the ring, and then running Ring.Get() on top of only healthy instances (similarly to what we do with shuffle sharding, where we build a sub-ring).

Compactor ring auto-forget is still desired and will workaround this, but the fix I'm suggesting will actually be the definitive fix because it will trigger sooner than the auto-forget.

pracucci avatar Dec 15 '22 08:12 pracucci

I think a way to fix it would be filtering out all unhealthy instances from the ring, and then running Ring.Get() on top of only healthy instances (similarly to what we do with shuffle sharding, where we build a sub-ring).

I had very similar need recently when trying to build fast ingester shutdown, except I wanted to filter out LEAVING ingesters. If we're adding functionality like this to ring, I'd suggest adding generic "Filter Instances" method to the ring, which returns subset of the ring with instances based on supplied predicate function.

pstibrany avatar Dec 15 '22 08:12 pstibrany

I'll take a look at adding a filter method to the ring in dskit unless someone else has started already.

56quarters avatar Dec 16 '22 14:12 56quarters

I'll take a look at adding a filter method to the ring in dskit unless someone else has started already.

I would start from the auto-forget which looks easier to me and will be a good workaround. I think the problem of the filter is making it efficient.

pracucci avatar Dec 16 '22 14:12 pracucci