neo icon indicating copy to clipboard operation
neo copied to clipboard

Ledger Sync: Add block hash checkpoint for the ease of the ledger sync.

Open Jim8y opened this issue 2 years ago β€’ 21 comments

Summary or problem description Currenty it takes days to synchronize the whole neo chain, the major reason behind this is that for every single block we downloaded from the p2p network, we have to verify the witness of them to make sure that a block is authentic and integrate, yet the change of the consensus nodes can only be known after the previous blocks are processed, which is sequential and takes a very long time to finish. Thus we need to figure out a way to break this constrain.

Do you have any solution you want to propose? Add block hashs checkpoint during the consensus, such that user can download the whole block hashs file, that is easier to verify, seperately. And then user either download the fast sync file provided by others, or sync the whole ledger from the chain without verifying the witness but only check the block hash.

Data Structure:

block hash file:

`blockhash | blockhash | blockhash | .... | ... `

Consensus address file:

`address | index | address | index | address | index ... | ...`

Expected speed up:

From days to less than 1 hour.

Neo Version

  • Neo 3

Where in the software does this update applies to?

  • Consensus
  • CLI
  • Plugins
  • Ledger
  • Network Policy
  • P2P (TCP)

Jim8y avatar Jan 08 '24 10:01 Jim8y

the major reason behind this is that for every single block we downloaded from the p2p network, we have to verify the witness of them

No. Verifying witnesses takes some time (you can check this with SkipBlockVerification NeoGo option), but that's not the major problem.

Add block hashs checkpoint during the consensus

How can you verify authenticity of this file? At the genesis you trust the set of validators from the config, but then it changes. Standby validators can't sign it, even if they sign it's irrelevant (the real set has changed). If the file is signed by the new set of validators, how do you ensure this set is correct?

roman-khimov avatar Jan 08 '24 12:01 roman-khimov

@roman-khimov I know, I have being evaluating it in the past few days. First of all, yes, verifywitness is part of the cause (removing it can improve the speed by around 2 times), but network sync is a bigger issue, this pr can help to address that. Then, validators change, indeed, but it is not frequent, we can create one checkpoint every time the 7 validators updates, won't decrease the performance.

And once we catchup to the latest one, we can ask them to directly sign the validator update history so we can have a unified checkpoint regardless of further change.

oh wait, need to address the history signature issue. user need to be able to verify the authenticity of the latest validator list. i will keep thinking, any suggestion?@roman-khimov

Jim8y avatar Jan 08 '24 13:01 Jim8y

in the meanwhile, it is actually an issue of how does a light node verify the authenticity of a given block without syncing the whole blockchain.

Jim8y avatar Jan 08 '24 13:01 Jim8y

should be able to be fixed easily if we add the.execution result to the storage.

Jim8y avatar Jan 08 '24 13:01 Jim8y

A single core can do roughly 10K ECDSA verifications per second. Which means we can verify ~2K blocks per second. That's your verification overhead. At the current height of ~4.6M it's 2300s or about 40m of time. Now compare that to the overall synchronization time. Likely, it's not 1h30m to get a two-fold improvement.

A light node operates with headers, it has to sync and verify all of them. The cost of verification is outlined above.

A regular node needs #2373 then.

roman-khimov avatar Jan 08 '24 14:01 roman-khimov

A single core can do roughly 10K ECDSA verifications per second. Which means we can verify ~2K blocks per second. That's your verification overhead. At the current height of ~4.6M it's 2300s or about 40m of time. Now compare that to the overall synchronization time. Likely, it's not 1h30m to get a two-fold improvement.

A light node operates with headers, it has to sync and verify all of them. The cost of verification is outlined above.

A regular node needs #2373 then.

i need to update my data since while i sync more blocks, the overhead hotspot becomes the execution (contract call, syatem call) and vm reference check. previously i only evaluate for dozens of minutes. but still, if we can avoid the witness check, with your data, we can still avoid 40minutes right?

Jim8y avatar Jan 08 '24 16:01 Jim8y

@roman-khimov I know, I have being evaluating it in the past few days. First of all, yes, verifywitness is part of the cause (removing it can improve the speed by around 2 times), but network sync is a bigger issue, this pr can help to address that. Then, validators change, indeed, but it is not frequent, we can create one checkpoint every time the 7 validators updates, won't decrease the performance.

And once we catchup to the latest one, we can ask them to directly sign the validator update history so we can have a unified checkpoint regardless of further change.

oh wait, need to address the history signature issue. user need to be able to verify the authenticity of the latest validator list. i will keep thinking, any suggestion?@roman-khimov

Have you tried caching the witness? I remember that this improved a lot of time for me in the past, but erik doesn't like it 🀣

shargon avatar Jan 08 '24 16:01 shargon

https://github.com/neo-project/neo/pull/2616 was already merged πŸ˜…

shargon avatar Jan 08 '24 16:01 shargon

if we can avoid the witness check

We already have this option

shargon avatar Jan 08 '24 16:01 shargon

#2616 was already merged πŸ˜…

Its different, what i mean here is ecdsa verifications for each block. It can not be avoided unless we can verify block authenticity in another way.

Jim8y avatar Jan 08 '24 17:01 Jim8y

#2616 was already merged πŸ˜…

Its different, what i mean here is ecdsa verifications for each block. It can not be avoided unless we can verify block authenticity in another way.

If you sign the previous hash with the previous signatures in the next block, you can ensure that all is well done only verifying the last block, but... we can have different signatures and still being valid

shargon avatar Jan 08 '24 17:01 shargon

But if we modified the consensus to include the hash of the previous block witness, we can ensure that the block is valid, because we can change the previous block signatures with a deterministic ones πŸ™ƒ

shargon avatar Jan 08 '24 17:01 shargon

#2616 was already merged πŸ˜…

Its different, what i mean here is ecdsa verifications for each block. It can not be avoided unless we can verify block authenticity in another way.

If you sign the previous hash with the previous signatures in the next block, you can ensure that all is well done only verifying the last block, but... we can have different signatures and still being valid

That is actually what i have in mind before, but the problem is you gonna have to go through those blocks without knowing if they are valid or not untill you checked the last witness. So sad.

Jim8y avatar Jan 08 '24 17:01 Jim8y

#2616 was already merged πŸ˜…

Its different, what i mean here is ecdsa verifications for each block. It can not be avoided unless we can verify block authenticity in another way.

If you sign the previous hash with the previous signatures in the next block, you can ensure that all is well done only verifying the last block, but... we can have different signatures and still being valid

That is actually what i have in mind before, but the problem is you gonna have to go through those blocks without knowing if they are valid or not untill you checked the last witness. So sad.

Yes, but for me is not a huge problem

shargon avatar Jan 08 '24 17:01 shargon

The problem from i see is the threshold P2P only allow 500 max hashes at a time. Than stops and processes the data. With my custom node im able to download the whole blockchain in seconds. But processing takes time. And if its invalid data. It requests that block or transaction again. Its simple.

cschuchardt88 avatar Jan 09 '24 00:01 cschuchardt88

The problem from i see is the threshold P2P only allow 500 max hashes at a time. Than stops and processes the data. With my custom node im able to download the whole blockchain in seconds. But processing takes time. And if its invalid data. It requests that block or transaction again. Its simple.

I am exactly trying to make it easier to validate those data such that we can download them in secs..... Imaging that we can know the authenticity of received blocks by just checking is hash....., then it does not matter how we get the ledger data. Either download directly from somewhere, or sync from the p2p network in a fast manner.

Jim8y avatar Jan 09 '24 04:01 Jim8y

Yes, but for me is not a huge problem

Great, this is what i want. I think I maybe able to come up with an idea that can sync the whole blockchian within one hour. I'll try.

Jim8y avatar Jan 09 '24 04:01 Jim8y

@Jim8y FYI, there is some message commands in the protocol that haven't been implemented yet. I think its related. check it out.

cschuchardt88 avatar Jan 09 '24 04:01 cschuchardt88

The problem from i see is the threshold P2P only allow 500 max hashes at a time.

I'm not sure if you're suggesting to change the default 500 to some larger number. Believe me, 500 is a relatively safe number, making it larger can expose risks to node operators.

dusmart avatar Feb 02 '24 09:02 dusmart

The problem from i see is the threshold P2P only allow 500 max hashes at a time.

I'm not sure if you're suggesting to change the default 500 to some larger number. Believe me, 500 is a relatively safe number, making it larger can expose risks to node operators.

How can it be safe? Go in detail please. The protocol doesn't care. The limit is set to 500 block hashes per P2P message for requesting blocks. That isn't a problem you can request for example 500 blocks than the next 500 blocks, in the matter of milliseconds. Requesting 0-1,000,000 blocks at once already with any node on the network. The limit is set in the DataCache. There is no good enough reason to allow 500 block cache. This is the reason why when you download from the network. You see with command show state the right hand number go up so fast, than stop and than left trying to catch up. One shouldn't be waiting for the other. If we are offering the blockchain for download at https://sync.ngd.network/ than why not just do the same and download the blockchain data from the network, and wait for it to process. There is no different between the two. besides how long it takes to download.

cschuchardt88 avatar Feb 02 '24 17:02 cschuchardt88

How can it be safe? Go in detail please.

This safety point is from the providers' view. Providing too many blocks/transactions at a single P2P message can be a huge burden for the provider. As you pointed out, requesting blocks/transactions from a peer node once and once again is the scene I really care. Making this limit larger could make it much more easier for attackers to issue such an attack. And I indeed tried this kind of attacks before on a private net, a normal machine could be attacked by multiple malicious peers if we set this number larger.

This limit is not only for the blocks, but also transactions. Sometimes transaction body can be really large such as a deploy transaction. This is an entry for attack. I know that this can not prevent the attack totally. But this is not the reason we'll let the attacks more easier.

https://github.com/neo-project/neo/blob/92d487c1b04e44b41c485045682f7f0f6a32b6cc/src/Neo/Network/P2P/Payloads/InvPayload.cs#L27

There is no good enough reason to allow 500 block cache.

I won't comment on this. Maybe the cache this is something can be improved.

If we are offering the blockchain for download at https://sync.ngd.network/ than why not just do the same and download the blockchain data from the network, and wait for it to process. There is no different between the two.

No. IMO, letting the NGD's server face the downloading burden issue is much better than letting the NEO mainnet do it. The P2P network's first job is to maintain the current state and providing services for new transactions and new blocks. For those who want to start from head, they should not count on the relatively slower P2P network to be a fast choice.

dusmart avatar Feb 03 '24 04:02 dusmart