vega
vega copied to clipboard
[Feature]: Allow to automatical restart when data node is behind the core but local snapshot is present on the disk
Feature Overview
It is very often that data-node is behind the core when the system is not very healthy for some reason (e.g high traffic, disk IOPS issues, etc...). We can see the followingerror:
Jan 08 14:45:29 api0.vega.community visor[3749951]: vega data node stopped with error: block height on begin block, 30391214, is too high, the height of the last processed block is 30391119
It says data-node last block is 30391119, the core block was 30391214.
You can see that the data node was only 5 blocks behind. However, the snapshot was created at block 30391120.
To restart it We have to specify the core start height it with the --snapshot.load-from-block-height flag or with the config param in the <vega_home>/config/node/config.toml file.
It is not very practical. We can add some config that allows us to find the last available snapshot if available when this error happens.