Two indexers in write mode compete to add next block, loser locks up and cannot recover
Subject of the issue
When two indexers in write mode compete to add the next block, the loser will occasionally enter an infinite loop of failures.
The solution is to only run one at a time -- but our deployments are done via cloud run and there is some overlap during deployments. If the newcomer enters this failed state, progress halts because cloud run will kill the old outdated process.
2022-09-11 14:33:25.000 EDT
{"level":"info", "msg":"adding block 23992647"}
2022-09-11 14:33:25.000 EDT
{"error":"Process() handler err: AddBlock() err: TxWithRetry() err: attemptTx() err: AddBlock() adding block round 23992647 but next round to account is 23992648", "level":"error", "msg":"block 23992647 import failed"}
2022-09-11 14:33:26.000 EDT
{"level":"info", "msg":"adding block 23992647"}
2022-09-11 14:33:26.000 EDT
{"error":"Process() handler err: AddBlock() err: TxWithRetry() err: attemptTx() err: AddBlock() adding block round 23992647 but next round to account is 23992648", "level":"error", "msg":"block 23992647 import failed"}
2022-09-11 14:33:27.000 EDT
{"level":"info", "msg":"adding block 23992647"}
2022-09-11 14:33:27.000 EDT
{"error":"Process() handler err: AddBlock() err: TxWithRetry() err: attemptTx() err: AddBlock() adding block round 23992647 but next round to account is 23992648", "level":"error", "msg":"block 23992647 import failed"}
2022-09-11 14:33:29.000 EDT
{"level":"info", "msg":"adding block 23992647"}
2022-09-11 14:33:29.000 EDT
{"error":"Process() handler err: AddBlock() err: TxWithRetry() err: attemptTx() err: AddBlock() adding block round 23992647 but next round to account is 23992648", "level":"error", "msg":"block 23992647 import failed"}
2022-09-11 14:33:30.000 EDT
{"level":"info", "msg":"adding block 23992647"}
Your environment
Running on cloud run.
Algod indexer version: 2.14.0
Steps to reproduce
- Start two indexers with same configuration in write mode
- Wait.
Expected behaviour
One instance will successfully commit current round. One will fail to commit current round. Both will be capable of committing the next block. Neither will get stuck trying to commit the originally contested block.
Actual behaviour
One indexer will successfully commit the current round. One will fail and enter a loop trying to commit the round it lost.
Possibly related: https://github.com/algorand/indexer/issues/336, https://github.com/algorand/indexer/issues/1173