neofs-node Revise blockchain height check on startup

Inner Ring and Storage nodes check that height of the underlying blockchain height is greater or equal than the latest encountered one optionally persisted in the local storage (config and config respectively).

App requests current height by RPC, compares results with peristed one and fails if the local value is greater.

Which chain is stuck?

according to @aprasolova experience, we an encounter next error in log:

RPC block counter 738108 didn't reach expected height 2272533

It is not visible from the message which chain - main or side - is stuck. It's proposed to reflect blockchain kind in this log message.

Await or not await

it's possible that chain node currently synchronizes its state, and it hasn't reached up-to-date state yet. In this case NeoFS node will immediately fail. In fact, it could wait within some context (global or with some sane deadline) and free admin to periodically restart the app.

btw in code check function is called awaitHeight which syntactically implies a background wait, but in fact does not wait.

maybe there are other signs that will allow NeoFS to understand what exactly is happening at the moment and distinguish between freeze and synchronization, for example If so, then we could improve behavior and admin UX. @AnnaShaleva @roman-khimov

Blockchain reset

if chain was reset, and admin restarts the node - it will fail until fresh chain will reach the height not less than persisted one. In this case it's not obvious for admin that state should be reset too. As possible solution, we could also take into accout blockchain network magic, but it may be also left untouched.

Jul 06 '23 17:07 cthulhu-rider

btw in code check function is called awaitHeight which syntactically implies a background wait, but in fact does not wait.

There is some detail about it. It did wait in #798, but also stopped waiting in the same PR. So mb @532910 has some info about it (and the issue in general).

Jul 06 '23 18:07 carpawell

i also started to think about connection switch in multi-RPC setting. @carpawell ur an expert of this currently, pls explain how this reconn could affect our state sync

Jul 06 '23 18:07 cthulhu-rider

This block counter can't be perfect since local state can be dropped at any time. But it helps in some ways, so:

specifying the network in the log is good
waiting is OK as well, in some ways it's like a connection failure, no reason to fail completely

distinguish between freeze and synchronization

No 100% reliable way to do that. But StartWhenSynchronized RPC option helps somewhat, at least the node is supposed to be up to date when it starts serving RPC (so this problem shouldn't happen at all).

if chain was reset

Just forget this for now.

Jul 19 '23 19:07 roman-khimov

neofs-node neofs-node copied to clipboard

Revise blockchain height check on startup

Which chain is stuck?

Await or not await

Blockchain reset

neofs-node
neofs-node copied to clipboard