mission-control-indexer Subgraphs stuck at 50 blocks behind after fix to correct Eth node providing a block number in the future

An Ethereum archive node I was using ended up providing a block in the future to my graph-node which caused my subgraphs to stop syncing. I am not alone in encountering this: https://github.com/graphprotocol/mission-control-indexer/issues/84

At the time I ran: select * from ethereum_networks;

and got the following:

  name   |                         head_block_hash                          | head_block_number | net_version |                        genesis_block_hash                        
---------+------------------------------------------------------------------+-------------------+-------------+------------------------------------------------------------------
 mainnet | f41c5acc79ad40586e79d50f184d048803af9969ccfe30a76f5cba3fc6a4467c |          11562297 | 1           | d4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3
(1 row)

I manually adjusted the head_block_number to a current value, following a report of success taking this approach. Upon starting my graph-node again all my subgraphs began making progress, with the exception of Moloch, which had already been fully synced and now showed as being a negative number of blocks behind!

After leaving this running for a while I began to see all the other subgraphs come to pause at 50 blocks behind. It was at this point I realized that I had forgotten to adjust the head_block_hash for the new head_block_number and I went and made this change. However it had no effect. I then adjusted the head_block_hash and head_block_number to the latest value at the time. All subgraphs updated to a higher number of blocks behind and started moving forward again but then stopped once more at 50. At no point does the head_block_hash and head_block_number change as it normally does, it remains stuck on the values set.

I have successfully run the Uniswap patch on this database since then and it has been able to get Uniswap syncing. I have tried to run the scripts to remove unused deployments (with the hope of it removing Moloch) but it finds: 0 unassigned subgraphs, despite there being several.

I am attaching logs for 2 of the subgraphs stuck at 50 blocks behind when the head_block_number was manually set to 11060963 (QmZ28XHpz1BNhR5jU2ABCqo2PqHutAPE8woaW1tYdDFDqV,QmW5UcMhXwXrMqRpTdT3MKB7HYDxbsp5oe9Atkuh4sgAA5):

LunaNova_TheGraph_QmZ28XHpz1BNhR5jU2ABCqo2PqHutAPE8woaW1tYdDFDqV_20201018.log LunaNova_TheGraph_QmW5UcMhXwXrMqRpTdT3MKB7HYDxbsp5oe9Atkuh4sgAA5_20201018.log

and a log of the graph-node in trace mode when I had later adjusted the head_block_number to 11077195:

LunaNova_TheGraph_withTrace_20201018.log

I notice when I run select latest_ethereum_block_number, failed from subgraphs.subgraph_deployment; that this returns:

 latest_ethereum_block_number | failed 
------------------------------+--------
                     11562297 | f
                     11077145 | f
                      9565812 | t
                      9603655 | f
                     10088717 | f
                     10973256 | t
                     11077145 | f
                     10452655 | t
                     11077145 | f
                     11077145 | f
                     11077145 | f
                     11060912 | f
                     10866663 | t
                      8350009 | t
                     11014325 | t
                     11077145 | f
                     11077145 | f
                     10606159 | t
(18 rows)

The one with the block in the future (11562297) is Moloch and I think if I can get this cleanly removed it may get things moving again. I would value advice on the best way to do this or any other viable approaches to resolve this situation.

I could just hose the database and start again but, as I've already synced quite lot of subgraphs, this seems wasteful and as I'm not the only one to encounter this issue ( and it could happen again!), it seems prudent to work out some sort of recovery strategy, in lieu of a definite fix for the root cause.

If anyone needs any more info, please let me know.

Oct 22 '20 22:10 Pete-LunaNova

This requires rewinding all your subgraphs. I put instructions on how to do that here

Oct 23 '20 00:10 lutter

I have successfully run the script to rewind the subgraphs and I can confirm that this appears to have resolved the issues! :-) @lutter, thank you ever so much for your help.

Oct 23 '20 22:10 Pete-LunaNova

mission-control-indexer mission-control-indexer copied to clipboard

Subgraphs stuck at 50 blocks behind after fix to correct Eth node providing a block number in the future

mission-control-indexer
mission-control-indexer copied to clipboard