ozone icon indicating copy to clipboard operation
ozone copied to clipboard

HDDS-10372. SCM and Datanode communication for reconciliation

Open errose28 opened this issue 10 months ago • 1 comments

What changes were proposed in this pull request?

A lot of boilerplate code to do something very simple:

  • Tell SCM to start reconciliation for a container from the CLI.
  • Have SCM tell Datanodes to reconcile that container with their peers.
  • Datanodes send back a placeholder container data checksum which we can fill in with reconciliation implementation later.
    • There is no communication between datanodes added in this change.
  • SCM updates its replica info based on the container report received after the Datanodes reconcile.

I've tried to avoid making any design related decisions in this PR. It is intended as a skeleton we can use to plug in the reconciliation implementation for end to end testing in future changes.

In scope for this change

  • Add new ozone admin container reconcile <container-id> command.
  • New command should be restricted to admins
  • Audit logging for new command
  • Blocking reconciliation of invalid containers (EC, 1 replica, still open)
  • Datanode queue metrics for reconciliation commands
  • Datanode and SCM application logs to follow the command as it moves through the system.
  • SCM saves container replicas' data checksums in memory, and they can be retrieved with ozone admin container info --json

Out of scope for this change (but will be handled in later tasks)

  • Any actual checksum related implementations
    • Currently byte strings are used as placeholders just to move filler data around for testing.
  • Recon integration with container data checksums
    • This includes Recon's ContainerReplicaHistoryProto
  • Finalized protobuf changes
    • Since the change is going to a feature branch we have the flexibility to evolve the protos later.
  • Good UX 😄
    • This includes flags for the reconcile command, an easy way to track reconciliation progress, and reading containers from stdin like other container subcommands support.
    • These will need some discussion so are probably best done as their own set of changes.
  • The following tasks have been moved out to do in follow up changes:
    • HDDS-10714 datanode status filtering for reconciliation peers and targets
    • HDDS-10759 Consider allowing reconciliation when not all replicas have reached closed state
    • HDDS-10760 SCMExceptions resulting from admin CLI commands are treated as retriable

What is the link to the Apache JIRA

HDDS-10372

How was this patch tested?

  • Acceptance test for CLI added
  • Manually tested the CLI with valid and invalid containers. Also manually checked SCM audit logging
  • Unit and integration tests added in the following classes:
    • TestReconcileContainerEventHandler: Tests SCM's filtering of reconciliation requests based on eligible container and replica states. When containers are eligible, tests that reconcile commands are sent to datanodes.
    • TestStateContext: Tests that the new command shows up in datanode queue metrics.
    • TestReconcileContainerCommandHandler: Tests datanode queue and runtime metrics when a reconcile command is received. Also tests that the ICR sent as the result of the command has the expected data checksum.
    • TestContainerDataYaml: Tests that the data checksum is not written to the .container file. Merkle tree information will be written to its own file in a different change.
    • TestHeartbeatEndpointTask: Tests that datanodes add a reconcile command to their queue when it is received on an SCM heartbeat response.
    • TestKeyValueHandler: Tests that the the KeyValueHandler triggers an ICR back to SCM with the expected values when reconciliation is invoked.
    • TestContainerReportHandler, TestIncrementalContainerReportHandler: Tests that SCM correctly saves replicas' data checksum information it receives on a heartbeat.

errose28 avatar Apr 10 '24 06:04 errose28

At last the CI is green

errose28 avatar May 07 '24 03:05 errose28