roxy icon indicating copy to clipboard operation
roxy copied to clipboard

When replicas specified in ml-config, fails wipe if number of hosts is one; allows replication with number of hosts less than 3.

Open jamsilvia opened this issue 7 years ago • 4 comments

Two parts to this problem:

  1. If the ml-config.xml specifies a replica, but there is only one host, bootstrap will succeed (with no warnings, and no replicas created). However, trying to wipe will fail with "ERROR: XDMP-DIVBYZERO". The problem being that the checks for the number of hosts to bypass "reassign-replicas" is only done on the create step, but not on the wipe step.
  2. Replication is not going to be usable for failover unless the number of hosts > 3 (to ensure a quorum). In general, the host count should be "2n+1", where n is the number of replicas. A warning should be generated if requesting replication for < 3 hosts. A warning COULD be generated when the "2n+1" rule is not followed.

Reproduce:

  1. Problem 1 is reproducible by:
  • Edit a default ml-config.xml to setup replication for a content forest:
    <assignment>
      <forest-name>${content-db}</forest-name>
      <replica-names>
        <replica-name>${content-db}-rep1</replica-name>
      </replica-names>
      @ml.forest-data-dir-xml
    </assignment>
    <assignment>
      <forest-name nr-replicas="1">${content-db}-rep1</forest-name>
      @ml.forest-data-dir-xml
    </assignment>
  • ./ml dev bootstrap
  • ./ml dev wipe
  • Should report the division by zero error.
  1. Problem 2 is reproducible by:
  • Edit a default ml-config.xml to setup replication for a content forest:
    <assignment>
      <forest-name>${content-db}</forest-name>
      <replica-names>
        <replica-name>${content-db}-rep1</replica-name>
      </replica-names>
      @ml.forest-data-dir-xml
    </assignment>
    <assignment>
      <forest-name nr-replicas="1">${content-db}-rep1</forest-name>
      @ml.forest-data-dir-xml
    </assignment>
  • ./ml dev bootstrap
  • Silently ignores replication.
  • Create a 2-node cluster and do the same steps as above.
  • Creates a replica, but failover would be unusable.

Which Operating System are you using? Linux and Mac OS-X

Which version of MarkLogic are you using? 9.0-2

Which version of Roxy are you using (see version.txt)? 1.7.6 and 1.7.7

I have a fix for this coded already, and we are currently testing it on our project.

jamsilvia avatar Aug 16 '17 16:08 jamsilvia

We have a fix already for this, Joe McGroarty has implemented it.

tdiepenbrock avatar Aug 28 '17 23:08 tdiepenbrock

@tdiepenbrock If you could share the changes applied, that would be grand. Offline, PR, or in a comment..

grtjn avatar Sep 05 '17 13:09 grtjn

Oh! I'll send that tonight....

jamsilvia avatar Sep 05 '17 21:09 jamsilvia

I submitted the pull request with the changes. Let me know if you need anything else.

jamsilvia avatar Sep 06 '17 02:09 jamsilvia