Gnosis indexer sync unstable
Overview
Recently gnosis indexer has been continually falling out of sync after catching back up (stay synced mode). We need to investigate where the issue lies so we can ensure the indexer stays in sync.
References and additional details
- gnosis
Acceptance Criteria
- Fix sync issue
- Improve liveness probe to catch the indexer falling out of sync and auto restart the container
Need By Date
No response
Screenshots/Mockups
No response
Estimated effort
No response
Current summary:
- Bottleneck is two part:
- encode/pack and decode/unpack of address contract data
- read and write of address contract data to db
To address number 1, I have updated the current manual byte slice packing and unpacking logic to leverage protobuf encoded blobs which are more efficient. This has made a different both on the compute performance as well as some memory overhead performance.
To address number 2, specifically in 1 worker sync mode which is the desired default to avoid db corruption introduced by the multi worker sync mode, an extra in memory cache layer has been added to reduce the frequency of db reads at the cost of additional memory footprint. Seeing as how the current state is unable to index faster than the blockchain produces blocks, this trade off is necessary at this juncture.
There is currently a fresh sync in progress with the protobuf data structure which I have been leveraging parallel workers to try and get synced up more quickly. This naturally introduces OOM kill scenarios which result in db corruption and the need to pick up from previous snapshot, but progress is being made. Once in sync fully, we will be able to fully test the single worker sync mode to ensure performance is enough to keep up with the chain and see what the "stay synced" memory footprint is going to end up being in this mode. From there, we will be able to run additional profiles to hone in on further optimizations if necessary.
It is worth noting, that we are starting to get as low level as you can go with performance optimizations in which new storage strategies, compression algorithms, encoding strategies, db options, etc. would need to start being looked into to handle the extreme bloat and throughput of these L2 chains which is not a good place for us to be in.
We would be left with a few options at that point:
- Continue down the perf road and keep spending resources trying to optimize (I would start needing some help in brainstorming and working through new ideas at this point most likely)
- Seeing as how the majority of this overhead is related to nft indexing, cut nfts from blockbook entirely and fully rely on 3rd party nft apis to populate the nft dashboard. This would primarily affect nft transaction history at first take (would need to think through any other ramifications)
- Start looking into other 3rd party index apis (balance + tx history) that could be used in lieu of blockbook for chains that require it
Proto update with extended address cache on single thread sync mode is looking to be sufficient. Waiting for sync to complete (still slow, but catching up) and will officially close upon confirmation of blockbook staying in sync
clsoing for #1008