neo icon indicating copy to clipboard operation
neo copied to clipboard

Block broadcasts: inv vs ping and P2P noise

Open roman-khimov opened this issue 4 years ago • 13 comments

Summary or problem description #1397 changed the way new blocks are being broadcasted through the network, before it there were inv messages with block hash, now the node sends ping with its new height (speculatively, I should add). This has some consequences that I'd like to discuss.

Tons of useless pings being sent

If you're to look at node's traffic for preview3 node during the initial sync, it constantly emits pings for new blocks added to the chain:

21:52:28.896477 IP (tos 0x0, ttl 64, id 7989, offset 0, flags [DF], proto TCP (6), length 67)
    bilouchun.hex.40758 > 52.X.Y.Z.20333: Flags [P.], cksum 0xb187 (incorrect -> 0x8d92), seq 617:632, ack 8516, win 501, options [nop,nop,TS val 4137511299 ecr 90497254], length 15
        0x0000:  4500 0043 1f35 4000 4006 6a2e c0a8 733f  E..C.5@[email protected]?
        0x0010:  34a6 48c4 9f36 4f6d e051 de66 ba0b 2e61  4.H..6Om.Q.f...a
        0x0020:  8018 01f5 b187 0000 0101 080a f69d 6983  ..............i.
        0x0030:  0564 e0e6 0018 0c01 0000 006c dd36 5f6d  .d.........l.6_m
        0x0040:  dd6d 33                                  .m3
...
    bilouchun.hex.40758 > 52.X.Y.Z.20333: Flags [P.], cksum 0xb187 (incorrect -> 0x1a3f), seq 632:647, ack 8516, win 501, options [nop,nop,TS val 4137511319 ecr 90497254], length 15
        0x0000:  4500 0043 1f36 4000 4006 6a2d c0a8 733f  E..C.6@[email protected]?
        0x0010:  34a6 48c4 9f36 4f6d e051 de75 ba0b 2e61  4.H..6Om.Q.u...a
        0x0020:  8018 01f5 b187 0000 0101 080a f69d 6997  ..............i.
        0x0030:  0564 e0e6 0018 0c02 0000 006c dd36 5fb8  .d.........l.6_.
        0x0040:  3552 4e                                  5RN

...
21:52:28.923353 IP (tos 0x0, ttl 64, id 7991, offset 0, flags [DF], proto TCP (6), length 67)
    bilouchun.hex.40758 > 52.X.Y.Z.20333: Flags [P.], cksum 0xb187 (incorrect -> 0x7254), seq 647:662, ack 8516, win 501, options [nop,nop,TS val 4137511325 ecr 90497254], length 15
        0x0000:  4500 0043 1f37 4000 4006 6a2c c0a8 733f  E..C.7@[email protected],..s?
        0x0010:  34a6 48c4 9f36 4f6d e051 de84 ba0b 2e61  4.H..6Om.Q.....a
        0x0020:  8018 01f5 b187 0000 0101 080a f69d 699d  ..............i.
        0x0030:  0564 e0e6 0018 0c03 0000 006c dd36 5f3d  .d.........l.6_=
        0x0040:  e4a1 47                                  ..G

All these 00180c01000000..., 00180c02000000..., 00180c03000000... messages happily tell our peers that we've got block number 1, 2, 3, etc. Each block is obviously an important event in node's life, but I don't think any peer with their current ~54K height really care. Before #1397 there was some logic preventing similar behavior.

But as we're sending pings now, we also have another problem.

Ping-pong traffic Each ping technically requires an answer from the other side, which is pong. And sure enough, we can see a lot of these too:

21:52:28.982387 IP (tos 0x0, ttl 111, id 25046, offset 0, flags [DF], proto TCP (6), length 67)
    52.X.Y.Z.20333 > bilouchun.hex.40758: Flags [P.], cksum 0x5647 (correct), seq 67302:67317, ack 995, win 1025, options [nop,nop,TS val 90497363 ecr 4137511341], length 15
        0x0000:  4500 0043 61d6 4000 6f06 f88c 34a6 48c4  [email protected].
        0x0010:  c0a8 733f 4f6d 9f36 ba0c 1403 e051 dfe0  ..s?Om.6.....Q..
        0x0020:  8018 0401 5647 0000 0101 080a 0564 e153  ....VG.......d.S
        0x0030:  f69d 69ad 0019 0c09 d300 006c dd36 5f79  ..i........l.6_y
        0x0040:  29e3 61                                  ).a
...
21:52:29.000562 IP (tos 0x0, ttl 111, id 25051, offset 0, flags [DF], proto TCP (6), length 67)
    52.X.Y.Z.20333 > bilouchun.hex.40758: Flags [P.], cksum 0xd58f (correct), seq 67377:67392, ack 1070, win 1025, options [nop,nop,TS val 90497383 ecr 4137511361], length 15
        0x0000:  4500 0043 61db 4000 6f06 f887 34a6 48c4  [email protected].
        0x0010:  c0a8 733f 4f6d 9f36 ba0c 144e e051 e02b  ..s?Om.6...N.Q.+
        0x0020:  8018 0401 d58f 0000 0101 080a 0564 e167  .............d.g
        0x0030:  f69d 69c1 0019 0c09 d300 006c dd36 5ff6  ..i........l.6_.
        0x0040:  ae5f 5c                                  ._\

Even if we're to solve the first problem in some way, this one is an issue of its own, because we don't really care about pongs in this case, all we want is to tell everyone that there is a new block and to do that we have an inv message type, that is made specifically to advertise things we have on the network. So before #1397 block broadcasting worked like inv - getdata - block sequence, but now it's more like ping - pong, getblockbyindex - block and what's the point of this pong in the scheme? Also note that if the other side already has a block in question it could just ignore the inv previously, but now it has to respond with a pong anyway.

But there is another subtle thing if we're to look into the ping contents.

Node's height and ping contents It's a bit more opinionated and I've seen discussions around PingPayload.Create(Singleton.Height + 1) and I admit that it can be somewhat worthy optimization, but at the same time it looks like a blatant lie sent to the peer. IMO ping's payload should tell everyone our current height and the other side can assume that the node has a state corresponding to that height. But it doesn't at the moment, this height is being advertised speculatively, before fully persisting the block.

Compare that to the innocent inv that is just an advertisement, it has no obligations associated with it. We're just telling everyone that we have this thing, that's it, whether we've persisted it or not is a completely separate question.

Do you have any solution you want to propose?

  1. Restore logic preventing useless block "broadcasts" during initial synchronization. It can be done the way it was before #1397 or in some other fashion (like neo-go is filtering peers with p.LastBlockIndex() < b.Index to determine which one will get a message).
  2. Change from ping messaging back to inv for the latest block relaying, it's less traffic and it looks a bit more appropriate for the task.

Neo Version

  • Neo 3

Where in the software does this update applies to?

  • P2P (TCP)

roman-khimov avatar Aug 14 '20 19:08 roman-khimov

IMO ping's payload should tell everyone our current height and the other side can assume that the node has a state corresponding to that height.

👍

ixje avatar Aug 18 '20 07:08 ixje

I think that we can remove Pong and send ping only if we spend X seconds in send a block.

shargon avatar Sep 01 '20 10:09 shargon

@shargon why do you want to remove the Pong message? It was added for a good reason; to be able to get an up to date chain state of the remote client. AFAIK it is also used in the current syncing strategy, neo-cli + other implementations (at least Python) also make use of it for syncing.

ixje avatar Sep 01 '20 11:09 ixje

I think that we can remove Pong and send ping only if we spend X seconds in send a block.

Well, we need to remember why these ping-pongs were added. And it happened in #899 to fix #841. A fully synchronized node of height H is not guaranteed to receive block H+1 via regular broadcasting mechanism, it can also miss H+2, H+3 or get them, but they couldn't be added to the chain anyway because of missing H+1. So it uses pinging to query its neighbors about their current height to see if it can request a new block from someone (or maybe not, consensus might be delayed for some reason for H+1). That's the essence of this mechanism, it requires both pings and pongs to work and without it the node can only wait for some new connection to a peer with bigger height (sent with a version packet).

And we have inv for various announcements like announcing a new block, that's what it's made for.

roman-khimov avatar Sep 01 '20 12:09 roman-khimov

So it uses pinging to query its neighbors about their current height to see if it can request a new block from someone (or maybe not, consensus might be delayed for some reason for H+1). T

That's exactly the original reason why it got added in https://github.com/neo-project/neo/pull/673

ixje avatar Sep 01 '20 13:09 ixje

Exactly, at the time it was added we had problems with a fixed variable called start_height which was not updated. The ping/pong had this main purpose of updating nodes about their status, it is more like a communication with neighbors as you emphasized.

The way blocks are being broadcasting can be seen as a different issue in my opinion. Perhaps it still need adjustments. However, a redesign may involve a change in all these feature. But, perhaps, ping/pong is quite basic. I believe we can improve it, maybe nodes can pong with an additional variable telling some prospection/trend...something like that.

vncoelho avatar Sep 01 '20 13:09 vncoelho

Miss-clicked.

vncoelho avatar Sep 01 '20 13:09 vncoelho

@roman-khimov, I was double checking your comment and suggestion.

I think that ping/pong is not really a traffic, it is inside the communication channel of two peers, it should not be rebroadcasted. Perhaps it can have better strategies for alleviating the burden between two peers. But I believe it is light like a minimal websocket.

But perhaps we need to think about better timing strategies for making those updates.

vncoelho avatar Sep 01 '20 13:09 vncoelho

Currently we send ping twice, one in blockchain before persist, and other by tasks but also this tasks was raised OnBlock I think that the blockchain ping it's not required.

https://github.com/neo-project/neo/blob/93226750fd60a161a257dcb9c42bd745cd671d5c/src/neo/Ledger/Blockchain.cs#L332-L333

shargon avatar Sep 01 '20 18:09 shargon

Close?

Tommo-L avatar Nov 25 '20 04:11 Tommo-L

Certainly not, from what I see in the code we're still misusing pings and sending one ping per persisted block.

roman-khimov avatar Nov 25 '20 08:11 roman-khimov

            if (block.Index == Height + 1)
            {
                if (!block.Verify(currentSnapshot))
                    return VerifyResult.Invalid;
                block_cache.TryAdd(block.Hash, block);
                block_cache_unverified.Remove(block.Index);
                Persist(block);
                SaveHeaderHashList();
                if (block_cache_unverified.TryGetValue(Height + 1, out var unverifiedBlocks))
                {
                    foreach (var unverifiedBlock in unverifiedBlocks.Blocks)
                        Self.Tell(unverifiedBlock, ActorRefs.NoSender);
                    block_cache_unverified.Remove(Height + 1);
                }
                // We can store the new block in block_cache and tell the new height to other nodes after Persist().
                system.LocalNode.Tell(Message.Create(MessageCommand.Ping, PingPayload.Create(Singleton.Height)));
            }
            return VerifyResult.Succeed;
        }

It happens here, @roman-khimov. Before we had a mechanism that just made rebroadcast if close to the last known height. I remember to comment that when we modified to index, but it was removed. It would be good to have a similar strategy before this line here system.LocalNode.Tell(Message.Create(MessageCommand.Ping, PingPayload.Create(Singleton.Height)));.

vncoelho avatar Nov 25 '20 13:11 vncoelho

It's not just that, it's also important to stop misusing pings for things that are to be done with invs.

roman-khimov avatar Nov 25 '20 14:11 roman-khimov