unchained icon indicating copy to clipboard operation
unchained copied to clipboard

Gnosis indexer sync unstable

Open kaladinlight opened this issue 2 years ago • 1 comments

Overview

Recently gnosis indexer has been continually falling out of sync after catching back up (stay synced mode). We need to investigate where the issue lies so we can ensure the indexer stays in sync.

References and additional details

  • gnosis

Acceptance Criteria

  • Fix sync issue
  • Improve liveness probe to catch the indexer falling out of sync and auto restart the container

Need By Date

No response

Screenshots/Mockups

No response

Estimated effort

No response

kaladinlight avatar Jan 16 '24 20:01 kaladinlight

Current summary:

  • Bottleneck is two part:
  1. encode/pack and decode/unpack of address contract data
  2. read and write of address contract data to db

To address number 1, I have updated the current manual byte slice packing and unpacking logic to leverage protobuf encoded blobs which are more efficient. This has made a different both on the compute performance as well as some memory overhead performance.

To address number 2, specifically in 1 worker sync mode which is the desired default to avoid db corruption introduced by the multi worker sync mode, an extra in memory cache layer has been added to reduce the frequency of db reads at the cost of additional memory footprint. Seeing as how the current state is unable to index faster than the blockchain produces blocks, this trade off is necessary at this juncture.

There is currently a fresh sync in progress with the protobuf data structure which I have been leveraging parallel workers to try and get synced up more quickly. This naturally introduces OOM kill scenarios which result in db corruption and the need to pick up from previous snapshot, but progress is being made. Once in sync fully, we will be able to fully test the single worker sync mode to ensure performance is enough to keep up with the chain and see what the "stay synced" memory footprint is going to end up being in this mode. From there, we will be able to run additional profiles to hone in on further optimizations if necessary.

It is worth noting, that we are starting to get as low level as you can go with performance optimizations in which new storage strategies, compression algorithms, encoding strategies, db options, etc. would need to start being looked into to handle the extreme bloat and throughput of these L2 chains which is not a good place for us to be in.

We would be left with a few options at that point:

  • Continue down the perf road and keep spending resources trying to optimize (I would start needing some help in brainstorming and working through new ideas at this point most likely)
  • Seeing as how the majority of this overhead is related to nft indexing, cut nfts from blockbook entirely and fully rely on 3rd party nft apis to populate the nft dashboard. This would primarily affect nft transaction history at first take (would need to think through any other ramifications)
  • Start looking into other 3rd party index apis (balance + tx history) that could be used in lieu of blockbook for chains that require it

kaladinlight avatar Apr 15 '24 19:04 kaladinlight

Proto update with extended address cache on single thread sync mode is looking to be sufficient. Waiting for sync to complete (still slow, but catching up) and will officially close upon confirmation of blockbook staying in sync

kaladinlight avatar May 03 '24 22:05 kaladinlight

clsoing for #1008

0xean avatar Jul 02 '24 19:07 0xean