Two indexers in write mode compete to add next block, loser locks up and cannot recover

Open willscripted opened this issue 3 years ago • 0 comments

Subject of the issue

When two indexers in write mode compete to add the next block, the loser will occasionally enter an infinite loop of failures.

The solution is to only run one at a time -- but our deployments are done via cloud run and there is some overlap during deployments. If the newcomer enters this failed state, progress halts because cloud run will kill the old outdated process.

2022-09-11 14:33:25.000 EDT
{"level":"info", "msg":"adding block 23992647"}
2022-09-11 14:33:25.000 EDT
{"error":"Process() handler err: AddBlock() err: TxWithRetry() err: attemptTx() err: AddBlock() adding block round 23992647 but next round to account is 23992648", "level":"error", "msg":"block 23992647 import failed"}
2022-09-11 14:33:26.000 EDT
{"level":"info", "msg":"adding block 23992647"}
2022-09-11 14:33:26.000 EDT
{"error":"Process() handler err: AddBlock() err: TxWithRetry() err: attemptTx() err: AddBlock() adding block round 23992647 but next round to account is 23992648", "level":"error", "msg":"block 23992647 import failed"}
2022-09-11 14:33:27.000 EDT
{"level":"info", "msg":"adding block 23992647"}
2022-09-11 14:33:27.000 EDT
{"error":"Process() handler err: AddBlock() err: TxWithRetry() err: attemptTx() err: AddBlock() adding block round 23992647 but next round to account is 23992648", "level":"error", "msg":"block 23992647 import failed"}
2022-09-11 14:33:29.000 EDT
{"level":"info", "msg":"adding block 23992647"}
2022-09-11 14:33:29.000 EDT
{"error":"Process() handler err: AddBlock() err: TxWithRetry() err: attemptTx() err: AddBlock() adding block round 23992647 but next round to account is 23992648", "level":"error", "msg":"block 23992647 import failed"}
2022-09-11 14:33:30.000 EDT
{"level":"info", "msg":"adding block 23992647"}

Your environment

Running on cloud run.

Algod indexer version: 2.14.0

Dockerfile entrypoint.sh

Steps to reproduce

Start two indexers with same configuration in write mode
Wait.

Expected behaviour

One instance will successfully commit current round. One will fail to commit current round. Both will be capable of committing the next block. Neither will get stuck trying to commit the originally contested block.

Actual behaviour

One indexer will successfully commit the current round. One will fail and enter a loop trying to commit the round it lost.

Possibly related: https://github.com/algorand/indexer/issues/336, https://github.com/algorand/indexer/issues/1173

Sep 11 '22 18:09 willscripted