neo
neo copied to clipboard
Block broadcasts: inv vs ping and P2P noise
Summary or problem description
#1397 changed the way new blocks are being broadcasted through the network, before it there were inv
messages with block hash, now the node sends ping
with its new height (speculatively, I should add). This has some consequences that I'd like to discuss.
Tons of useless pings being sent
If you're to look at node's traffic for preview3 node during the initial sync, it constantly emits pings for new blocks added to the chain:
21:52:28.896477 IP (tos 0x0, ttl 64, id 7989, offset 0, flags [DF], proto TCP (6), length 67)
bilouchun.hex.40758 > 52.X.Y.Z.20333: Flags [P.], cksum 0xb187 (incorrect -> 0x8d92), seq 617:632, ack 8516, win 501, options [nop,nop,TS val 4137511299 ecr 90497254], length 15
0x0000: 4500 0043 1f35 4000 4006 6a2e c0a8 733f E..C.5@[email protected]?
0x0010: 34a6 48c4 9f36 4f6d e051 de66 ba0b 2e61 4.H..6Om.Q.f...a
0x0020: 8018 01f5 b187 0000 0101 080a f69d 6983 ..............i.
0x0030: 0564 e0e6 0018 0c01 0000 006c dd36 5f6d .d.........l.6_m
0x0040: dd6d 33 .m3
...
bilouchun.hex.40758 > 52.X.Y.Z.20333: Flags [P.], cksum 0xb187 (incorrect -> 0x1a3f), seq 632:647, ack 8516, win 501, options [nop,nop,TS val 4137511319 ecr 90497254], length 15
0x0000: 4500 0043 1f36 4000 4006 6a2d c0a8 733f E..C.6@[email protected]?
0x0010: 34a6 48c4 9f36 4f6d e051 de75 ba0b 2e61 4.H..6Om.Q.u...a
0x0020: 8018 01f5 b187 0000 0101 080a f69d 6997 ..............i.
0x0030: 0564 e0e6 0018 0c02 0000 006c dd36 5fb8 .d.........l.6_.
0x0040: 3552 4e 5RN
...
21:52:28.923353 IP (tos 0x0, ttl 64, id 7991, offset 0, flags [DF], proto TCP (6), length 67)
bilouchun.hex.40758 > 52.X.Y.Z.20333: Flags [P.], cksum 0xb187 (incorrect -> 0x7254), seq 647:662, ack 8516, win 501, options [nop,nop,TS val 4137511325 ecr 90497254], length 15
0x0000: 4500 0043 1f37 4000 4006 6a2c c0a8 733f E..C.7@[email protected],..s?
0x0010: 34a6 48c4 9f36 4f6d e051 de84 ba0b 2e61 4.H..6Om.Q.....a
0x0020: 8018 01f5 b187 0000 0101 080a f69d 699d ..............i.
0x0030: 0564 e0e6 0018 0c03 0000 006c dd36 5f3d .d.........l.6_=
0x0040: e4a1 47 ..G
All these 00180c01000000...
, 00180c02000000...
, 00180c03000000...
messages happily tell our peers that we've got block number 1, 2, 3, etc. Each block is obviously an important event in node's life, but I don't think any peer with their current ~54K height really care. Before #1397 there was some logic preventing similar behavior.
But as we're sending pings now, we also have another problem.
Ping-pong traffic
Each ping
technically requires an answer from the other side, which is pong
. And sure enough, we can see a lot of these too:
21:52:28.982387 IP (tos 0x0, ttl 111, id 25046, offset 0, flags [DF], proto TCP (6), length 67)
52.X.Y.Z.20333 > bilouchun.hex.40758: Flags [P.], cksum 0x5647 (correct), seq 67302:67317, ack 995, win 1025, options [nop,nop,TS val 90497363 ecr 4137511341], length 15
0x0000: 4500 0043 61d6 4000 6f06 f88c 34a6 48c4 [email protected].
0x0010: c0a8 733f 4f6d 9f36 ba0c 1403 e051 dfe0 ..s?Om.6.....Q..
0x0020: 8018 0401 5647 0000 0101 080a 0564 e153 ....VG.......d.S
0x0030: f69d 69ad 0019 0c09 d300 006c dd36 5f79 ..i........l.6_y
0x0040: 29e3 61 ).a
...
21:52:29.000562 IP (tos 0x0, ttl 111, id 25051, offset 0, flags [DF], proto TCP (6), length 67)
52.X.Y.Z.20333 > bilouchun.hex.40758: Flags [P.], cksum 0xd58f (correct), seq 67377:67392, ack 1070, win 1025, options [nop,nop,TS val 90497383 ecr 4137511361], length 15
0x0000: 4500 0043 61db 4000 6f06 f887 34a6 48c4 [email protected].
0x0010: c0a8 733f 4f6d 9f36 ba0c 144e e051 e02b ..s?Om.6...N.Q.+
0x0020: 8018 0401 d58f 0000 0101 080a 0564 e167 .............d.g
0x0030: f69d 69c1 0019 0c09 d300 006c dd36 5ff6 ..i........l.6_.
0x0040: ae5f 5c ._\
Even if we're to solve the first problem in some way, this one is an issue of its own, because we don't really care about pongs in this case, all we want is to tell everyone that there is a new block and to do that we have an inv
message type, that is made specifically to advertise things we have on the network. So before #1397 block broadcasting worked like inv - getdata - block
sequence, but now it's more like ping - pong, getblockbyindex - block
and what's the point of this pong
in the scheme? Also note that if the other side already has a block in question it could just ignore the inv
previously, but now it has to respond with a pong
anyway.
But there is another subtle thing if we're to look into the ping
contents.
Node's height and ping contents
It's a bit more opinionated and I've seen discussions around PingPayload.Create(Singleton.Height + 1)
and I admit that it can be somewhat worthy optimization, but at the same time it looks like a blatant lie sent to the peer. IMO ping's payload should tell everyone our current height and the other side can assume that the node has a state corresponding to that height. But it doesn't at the moment, this height is being advertised speculatively, before fully persisting the block.
Compare that to the innocent inv
that is just an advertisement, it has no obligations associated with it. We're just telling everyone that we have this thing, that's it, whether we've persisted it or not is a completely separate question.
Do you have any solution you want to propose?
- Restore logic preventing useless block "broadcasts" during initial synchronization. It can be done the way it was before #1397 or in some other fashion (like neo-go is filtering peers with
p.LastBlockIndex() < b.Index
to determine which one will get a message). - Change from
ping
messaging back toinv
for the latest block relaying, it's less traffic and it looks a bit more appropriate for the task.
Neo Version
- Neo 3
Where in the software does this update applies to?
- P2P (TCP)
IMO ping's payload should tell everyone our current height and the other side can assume that the node has a state corresponding to that height.
👍
I think that we can remove Pong
and send ping only if we spend X seconds in send a block.
@shargon why do you want to remove the Pong message? It was added for a good reason; to be able to get an up to date chain state of the remote client. AFAIK it is also used in the current syncing strategy, neo-cli + other implementations (at least Python) also make use of it for syncing.
I think that we can remove Pong and send ping only if we spend X seconds in send a block.
Well, we need to remember why these ping-pongs were added. And it happened in #899 to fix #841. A fully synchronized node of height H
is not guaranteed to receive block H+1
via regular broadcasting mechanism, it can also miss H+2
, H+3
or get them, but they couldn't be added to the chain anyway because of missing H+1
. So it uses pinging to query its neighbors about their current height to see if it can request a new block from someone (or maybe not, consensus might be delayed for some reason for H+1
). That's the essence of this mechanism, it requires both pings and pongs to work and without it the node can only wait for some new connection to a peer with bigger height (sent with a version
packet).
And we have inv
for various announcements like announcing a new block, that's what it's made for.
So it uses pinging to query its neighbors about their current height to see if it can request a new block from someone (or maybe not, consensus might be delayed for some reason for
H+1
). T
That's exactly the original reason why it got added in https://github.com/neo-project/neo/pull/673
Exactly, at the time it was added we had problems with a fixed variable called start_height
which was not updated.
The ping/pong had this main purpose of updating nodes about their status, it is more like a communication with neighbors as you emphasized.
The way blocks are being broadcasting can be seen as a different issue in my opinion. Perhaps it still need adjustments. However, a redesign may involve a change in all these feature. But, perhaps, ping/pong is quite basic. I believe we can improve it, maybe nodes can pong with an additional variable telling some prospection/trend...something like that.
Miss-clicked.
@roman-khimov, I was double checking your comment and suggestion.
I think that ping/pong is not really a traffic, it is inside the communication channel of two peers, it should not be rebroadcasted. Perhaps it can have better strategies for alleviating the burden between two peers. But I believe it is light like a minimal websocket.
But perhaps we need to think about better timing strategies for making those updates.
Currently we send ping twice, one in blockchain before persist, and other by tasks but also this tasks was raised OnBlock
I think that the blockchain ping it's not required.
https://github.com/neo-project/neo/blob/93226750fd60a161a257dcb9c42bd745cd671d5c/src/neo/Ledger/Blockchain.cs#L332-L333
Close?
Certainly not, from what I see in the code we're still misusing pings and sending one ping per persisted block.
if (block.Index == Height + 1)
{
if (!block.Verify(currentSnapshot))
return VerifyResult.Invalid;
block_cache.TryAdd(block.Hash, block);
block_cache_unverified.Remove(block.Index);
Persist(block);
SaveHeaderHashList();
if (block_cache_unverified.TryGetValue(Height + 1, out var unverifiedBlocks))
{
foreach (var unverifiedBlock in unverifiedBlocks.Blocks)
Self.Tell(unverifiedBlock, ActorRefs.NoSender);
block_cache_unverified.Remove(Height + 1);
}
// We can store the new block in block_cache and tell the new height to other nodes after Persist().
system.LocalNode.Tell(Message.Create(MessageCommand.Ping, PingPayload.Create(Singleton.Height)));
}
return VerifyResult.Succeed;
}
It happens here, @roman-khimov. Before we had a mechanism that just made rebroadcast
if close to the last known height.
I remember to comment that when we modified to index
, but it was removed.
It would be good to have a similar strategy before this line here system.LocalNode.Tell(Message.Create(MessageCommand.Ping, PingPayload.Create(Singleton.Height)));
.
It's not just that, it's also important to stop misusing pings for things that are to be done with invs.