ozone icon indicating copy to clipboard operation
ozone copied to clipboard

HDDS-7092. EC Offline Recovery with simultaneous Over Replication

Open swamirishi opened this issue 2 years ago • 3 comments

…& Under Replication

What changes were proposed in this pull request?

In case of overreplication & underreplication happening together, it could so happen that all nodes are excluded in case of underreplication. Thus only underreplicationHandler would continuously fail & overreplication will not run.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7092

How was this patch tested?

Unit Tests, Integration Tests

swamirishi avatar Aug 09 '22 18:08 swamirishi

I'm not sure about this change. I feel it adds quite a bit of complexity to the model to handle what is really an edge case for a small cluster. I'd like a container to be handled for only a single state at a time as we could get into a position where we try reconstruction only to have a replica we are reading from removed by over-replication handling at the same time. It also makes things easier to think about, rather than the container having many states.

Few thoughts come to mind:

  1. What if we return over-replication health first, before under-replication? That way, we can fix this with a small change, by just changing the order in the ECContainerHealthCheck. The negative is that we have to wait longer to fix an under-replication, but the change here will be small and the general flow stays the same.

  2. Or, inside the under-replication handler, we have access to the methods in ECContainerReplicaCount which can tell us if there are over-replicated indexes. If, and only if, we get a failure in the UnderRep handler due to not finding enough nodes, we can somehow switch to over-replication handling and remove the excess replicas. However that starts to muddle the over and under replication logic together, which I was trying to avoid.

I probably need to think about it a bit more.

sodonnel avatar Aug 22 '22 14:08 sodonnel

we could get into a position where we try reconstruction only to have a replica we are reading from removed by over-replication handling at the same time.

I am not sure this patch added much complexity than what we have, but your pointed case is a good one to rethink on this patch.

I agree this should be a corner case and this is possible only for smaller clusters. However we should fix this though as some test clusters can go into this situations and we should have some solution for it. I think delaying under replication is not a good idea. or should be check cluster size and over replications are at same size then only we return over replication first? I know this would introduce one more ugly check though.

umamaheswararao avatar Aug 22 '22 17:08 umamaheswararao

I feel we should handle this in the under-replication handler. In normal circumstances, the container can be under-replicated and it might be over replicated too, but if we can create enough new replicas for under-replication we don't care about the over-replication at this stage.

However if we find we cannot place enough new replicas, we can check for over-replication. If it is over replicated too, we can refactor the code so we can call into the over-replication handler and schedule the delete container commands.

This would avoid removing replicas we may depend on for reconstruction or copy. It also avoids the race condition where we queue both an under and over replication, and the under-replication would fail until the over replication gets processed and completed.

If we handle it in the under replication handler, the health check code still only returns a single state (healthy, under or over replicated). When we integrate the Ratis code, it is not possible for a container to be both under and over replicated, so it keeps the code consistent between the two flows, which is helpful to understanding in the future.

What do you think?

sodonnel avatar Aug 23 '22 11:08 sodonnel

Fixed with an alternative approach in #3984

sodonnel avatar Nov 23 '22 09:11 sodonnel