roxy
roxy copied to clipboard
When replicas specified in ml-config, fails wipe if number of hosts is one; allows replication with number of hosts less than 3.
Two parts to this problem:
- If the ml-config.xml specifies a replica, but there is only one host, bootstrap will succeed (with no warnings, and no replicas created). However, trying to wipe will fail with "ERROR: XDMP-DIVBYZERO". The problem being that the checks for the number of hosts to bypass "reassign-replicas" is only done on the create step, but not on the wipe step.
- Replication is not going to be usable for failover unless the number of hosts > 3 (to ensure a quorum). In general, the host count should be "2n+1", where n is the number of replicas. A warning should be generated if requesting replication for < 3 hosts. A warning COULD be generated when the "2n+1" rule is not followed.
Reproduce:
- Problem 1 is reproducible by:
- Edit a default ml-config.xml to setup replication for a content forest:
<assignment>
<forest-name>${content-db}</forest-name>
<replica-names>
<replica-name>${content-db}-rep1</replica-name>
</replica-names>
@ml.forest-data-dir-xml
</assignment>
<assignment>
<forest-name nr-replicas="1">${content-db}-rep1</forest-name>
@ml.forest-data-dir-xml
</assignment>
- ./ml dev bootstrap
- ./ml dev wipe
- Should report the division by zero error.
- Problem 2 is reproducible by:
- Edit a default ml-config.xml to setup replication for a content forest:
<assignment>
<forest-name>${content-db}</forest-name>
<replica-names>
<replica-name>${content-db}-rep1</replica-name>
</replica-names>
@ml.forest-data-dir-xml
</assignment>
<assignment>
<forest-name nr-replicas="1">${content-db}-rep1</forest-name>
@ml.forest-data-dir-xml
</assignment>
- ./ml dev bootstrap
- Silently ignores replication.
- Create a 2-node cluster and do the same steps as above.
- Creates a replica, but failover would be unusable.
Which Operating System are you using? Linux and Mac OS-X
Which version of MarkLogic are you using? 9.0-2
Which version of Roxy are you using (see version.txt)? 1.7.6 and 1.7.7
I have a fix for this coded already, and we are currently testing it on our project.
We have a fix already for this, Joe McGroarty has implemented it.
@tdiepenbrock If you could share the changes applied, that would be grand. Offline, PR, or in a comment..
Oh! I'll send that tonight....
I submitted the pull request with the changes. Let me know if you need anything else.