reth
reth copied to clipboard
Discussion: 1 body per request vs multiple bodies per request
Creating this issue to unblock #220 and continue the discussion here.
The question is: What are the tradeoffs between requesting 1 body per request to a peer vs requesting multiple bodies per request to a peer?
Currently:
- The body downloader requests 1 body per call to the P2P layer, but does multiple calls at the same time
- This is done because the downloader returns bodies in order of block number, not the order in which requests are fulfilled
- This means that the downloader assumes that the body returned is valid (but as soon as it is handed over to the stage, it is validated)
Alternatively:
- We request multiple bodies per call to the P2P layer
- There is no order guarantee on the bodies returned by the P2P layer, so we have to match ourselves
- A way to match is to calculate the transactions root of the body, however, in order to match this root with a header, we need to store a mapping of
TxRoot => HeaderHash, and the downloader would need a RO transaction to the database (which is not the case currently)
In other words: Is the added complexity (extra database calls, more computation) worth the upside? What is the upside?
Worth noting:
- If we compute the transaction root as soon as the request is fulfilled, we might compute the transaction root for a large number of bodies we later discard, if they are after an invalid body. This is not currently the case, since the transaction root is calculated in the order the blocks are handed over to the stage
- The alternative requires a new table for only that part of sync (it is not used by e.g. RPC)
I am mainly worried about potential DoS vectors - a malicious node could send us a list of transactions that are not associated with any known header, we should not do anything with the bodies until we map it to a header. Overall though I think requesting multiple bodies per p2p message is alright.
The bodies stage already fetches headers for each body we request, so we can use the txroots from each header to create an in-memory map TxRoot => Header (or TxRoot||OmmersRoot => Header), instead of creating a new db table?
There are a few advantages to computing roots:
- we can immediately validate peer responses
- definition of the validity of a
BlockBodiesmessage: all transactions and ommers returned match a header whose hash is in our request - having headers beforehand makes this process very easy and mainly cpu-bound (just calculating transaction and ommers root)
- we can kick malicious/faulty peers as soon as we know a response is invalid
- definition of the validity of a
- we may not need to worry about ordering, since we can use the above map to determine if a body corresponds to a hash from our request
Does this make sense? cc @mattsse @rakita @rkrasiuk
As @Rjected suggested in Discord:
- Let's assume NOTHING about what peers respond to us with.
- Let's define the validity rules.
- Let's add tests for literally any kind of raw data being fed back to us.
I want to move the discussion, from single or multi bodies request (as it does not matter a lot) to where to check transaction roots is it in the body stage or later?
multi bodies request is probably more optimal on network bandwidth as fewer requests are sent/received but it does not matter and can be switched later if we find it is a big trouble.
In both cases, we "trust" the peer f that they are delivering the requested body. @onbjerg how does the unwind happen body stage downloads all bodies, sender recovery recovers all signature, but this is checked in execution and f it fails you would unwind all recovered transaction until that block, even good ones?
and it is more safer to ask block body by its hash,
Other than that complexity, you will not have information about peer that was malicious/faulty to disconnect him.
#247 will make this much easier
We are now doing multiple bodies per request