archethic-node
archethic-node copied to clipboard
Open more nodes in validation nodes election
Describe the problem you discovered
Actually for a low UCO transfer amount we have 3 validation nodes. The minimum number of nodes to validate a transaction is 2. Let's say there is 2 nodes in the network which suddenly disconnect but are still globally available. If an election of a transaction select 3 nodes including the 2 disconnected one, the transaction will never be validated until theses node will be globally unavailable in the next (or next next) self repair. So the transaction is stuck for a long time and this can be really annoying if it's a network one.
Describe the solution you'd like
Here is a possible solution: When a welcome node receive a transaction, it do the election (assume 3 nodes) and 2 of them are locally unavailable (they are maybe disconnected) we know at this time that the transaction will not be validated (only 1 or 0 validation nodes will receive the StartMining message).
Actually the maximum number of validation nodes is 12. So the election may always return 12 nodes and then the welcome node select the needed number of validation node out of these 12 nodes, but only locally available nodes. (This selection may be random ?) Then for the replication and the validation node, to determine if the election is right, the welcome node can send the list of the 12 nodes from the election (that can be verified using the same election) and the selected one to validate the transaction. The selected ones should be part of the list of the 12 nodes.
With this behavior, we keep the same verification for the election, but we have more flexibility to select locally available nodes to validate the transaction. And do not let transaction waiting for next self repair.
While it can make it more flexible, it might go a bit against the Global, Unpredictable & Reproducible Election paradigm. Even if it leverages the authorized nodes as the first source of election, the subjectivity of the election is centered around the welcome node view. Making a further verification of the election non-reproducible and non-global.
Because the issue might only be between welcome node and some validation nodes, but maybe not between validation nodes. Hence, to ensure fault tolerance, I would prefer a delegation of the StartMining between the validation nodes instead of changing the election paradigm.
This is true that the validation nodes can send themself a StartMining message, but it works only if the welcome node has connection problem with other nodes. But if the node are really down, in that case the transaction will always fail. Maybe we should have a retry mechanism, but same a the proposed solution, a welcome node could run a retry even if the first election was good
An other idea would be to increase the minimum number of validation nodes, it will reduce the probability to have an election with all nodes disconnected, but the problem is still there (even with the proposed solution)
but it works only if the welcome node has connection problem with other nodes. But if the node are really down, in that case the transaction will always fail.
In all cases, if 2 out of 3 nodes are off, the network is not in a good shape and can be considered as dysfunctional. Hence, the transaction cannot be validated.
An other idea would be to increase the minimum number of validation nodes
Increasing the minimum number of validation nodes to a higher number, for instance: 5 will increase the fault tolerance. We should use an overbooking election to make sure the validation nodes will accept transactions.