neo
neo copied to clipboard
Consensus delays when reaching mempool capacity
Describe the bug Nodes have problems accepting a block when they reach mempool capacity. Probably a lot of change views are being made and/or transaction exchange is not happening fast enough.
To Reproduce Take neo-bench at c5b065f770356689852d641b43e986b4b2623991 (it's using preview4 binaries for C#), run
$ make build
$ make start.SharpFourNodes10wrk
You need a sufficiently wide machine for this test, 16 cores should be enough.
Expected behavior Nodes regularly producing blocks in ~5s time.
Screenshots
2020/12/28 17:29:55 Init 10 workers / 5m0s time limit (1000000 txs will try to send)
2020/12/28 17:29:56 empty block: 8
2020/12/28 17:29:57 CPU: 26.692%, Mem: 642.105MB
2020/12/28 17:30:00 (#9/11223) 11223 Tx's in 5050 ms 2222.376238 tps
2020/12/28 17:30:07 (#10/24178) 24178 Tx's in 5302 ms 4560.165975 tps
2020/12/28 17:30:07 CPU: 83.182%, Mem: 1176.242MB
2020/12/28 17:30:13 (#11/22695) 22695 Tx's in 5607 ms 4047.619048 tps
2020/12/28 17:30:17 CPU: 107.134%, Mem: 2316.840MB
2020/12/28 17:30:19 (#12/22150) 22150 Tx's in 5516 ms 4015.591008 tps
2020/12/28 17:30:25 (#13/20332) 20332 Tx's in 5633 ms 3609.444346 tps
2020/12/28 17:30:27 CPU: 75.900%, Mem: 2985.789MB
2020/12/28 17:30:31 (#14/27235) 27235 Tx's in 5934 ms 4589.652848 tps
2020/12/28 17:30:36 (#15/18628) 18628 Tx's in 5714 ms 3260.063003 tps
2020/12/28 17:30:37 CPU: 72.749%, Mem: 3549.074MB
2020/12/28 17:30:44 (#16/24702) 24702 Tx's in 6503 ms 3798.554513 tps
2020/12/28 17:30:48 CPU: 91.082%, Mem: 3965.957MB
2020/12/28 17:30:58 CPU: 51.020%, Mem: 4370.250MB
2020/12/28 17:31:08 CPU: 37.540%, Mem: 4697.836MB
2020/12/28 17:31:18 CPU: 39.054%, Mem: 5014.914MB
2020/12/28 17:31:28 CPU: 17.258%, Mem: 5367.684MB
2020/12/28 17:31:34 (#17/50000) 50000 Tx's in 37135 ms 1346.438670 tps
2020/12/28 17:31:34 empty block: 18
2020/12/28 17:31:34 (#18/0) 0 Tx's in 11302 ms 0.000000 tps
2020/12/28 17:31:37 (#19/282) 282 Tx's in 5069 ms 55.632275 tps
2020/12/28 17:31:38 CPU: 32.750%, Mem: 5222.434MB
2020/12/28 17:31:48 CPU: 85.411%, Mem: 5571.918MB
2020/12/28 17:31:58 CPU: 42.772%, Mem: 5653.996MB
2020/12/28 17:32:08 CPU: 32.732%, Mem: 5688.539MB
2020/12/28 17:32:18 CPU: 39.353%, Mem: 5725.320MB
2020/12/28 17:32:28 CPU: 25.848%, Mem: 5837.695MB
2020/12/28 17:32:38 CPU: 16.748%, Mem: 5874.668MB
2020/12/28 17:32:48 CPU: 17.612%, Mem: 5626.594MB
2020/12/28 17:32:58 CPU: 9.045%, Mem: 5657.074MB
2020/12/28 17:33:08 CPU: 19.866%, Mem: 6310.023MB
2020/12/28 17:33:12 (#20/50000) 50000 Tx's in 89536 ms 558.434596 tps
2020/12/28 17:33:15 (#21/1727) 1727 Tx's in 5616 ms 307.514245 tps
2020/12/28 17:33:16 (#22/1181) 1181 Tx's in 5185 ms 227.772420 tps
2020/12/28 17:33:18 CPU: 35.401%, Mem: 6534.309MB
2020/12/28 17:33:21 (#23/15863) 15863 Tx's in 5435 ms 2918.675253 tps
2020/12/28 17:33:28 CPU: 79.818%, Mem: 6898.348MB
2020/12/28 17:33:38 CPU: 62.619%, Mem: 7042.066MB
2020/12/28 17:33:48 CPU: 28.420%, Mem: 7145.914MB
2020/12/28 17:33:58 CPU: 21.307%, Mem: 7225.926MB
2020/12/28 17:34:01 (#24/50000) 50000 Tx's in 38075 ms 1313.197636 tps
2020/12/28 17:34:08 CPU: 76.521%, Mem: 7649.957MB
2020/12/28 17:34:18 CPU: 81.264%, Mem: 8019.516MB
2020/12/28 17:34:21 (#25/20972) 20972 Tx's in 11252 ms 1863.846427 tps
2020/12/28 17:34:28 CPU: 74.266%, Mem: 8209.602MB
2020/12/28 17:34:38 CPU: 44.148%, Mem: 8322.035MB
2020/12/28 17:34:48 CPU: 42.075%, Mem: 8250.516MB
2020/12/28 17:34:55 all request workers stopped
2020/12/28 17:34:55 Sent 481476 transactions in 5m0.093059756s
2020/12/28 17:34:55 RPS: 1604.422
2020/12/28 17:34:55 All transactions have been sent successfully
2020/12/28 17:34:55 RPC Errors: 0 / 0.000%
2020/12/28 17:34:55 sender worker stopped
2020/12/28 17:34:58 CPU: 30.588%, Mem: 8105.781MB
2020/12/28 17:35:08 CPU: 19.525%, Mem: 8215.816MB
2020/12/28 17:35:18 CPU: 0.060%, Mem: 8231.180MB
2020/12/28 17:35:28 CPU: 0.088%, Mem: 8222.883MB
2020/12/28 17:35:38 CPU: 0.094%, Mem: 8206.570MB
2020/12/28 17:35:48 CPU: 4.990%, Mem: 8016.535MB
2020/12/28 17:35:53 (#26/50000) 50000 Tx's in 99215 ms 503.956055 tps
2020/12/28 17:35:54 empty block: 27
2020/12/28 17:35:54 (#27/0) 0 Tx's in 7356 ms 0.000000 tps
2020/12/28 17:35:58 empty block: 28
2020/12/28 17:35:58 (#28/0) 0 Tx's in 5042 ms 0.000000 tps
As you can see time for 50K transaction blocks jumps at least to 38s, easily to 100s and sometimes even more (on different runs).
To improve reproducibility Use 15s block intervals, it can be achieved with this modification to neo-bench:
--- a/.docker/ir/sharp.protocol.template.yml
+++ b/.docker/ir/sharp.protocol.template.yml
@@ -31,7 +31,7 @@
SeedList:
- #@ "node:{}".format(data.values.nodes_info[i].node_port)
#@ else:
- MillisecondsPerBlock: 5000
+ MillisecondsPerBlock: 15000
MemoryPoolMaxTransactions: 50000
ValidatorsCount: #@ nodes_count - 2
StandbyCommittee:
This way mempool obviously becomes clogged faster and the result may look like this:
2020/12/28 17:42:04 Init 10 workers / 5m0s time limit (1000000 txs will try to send)
2020/12/28 17:42:05 empty block: 5
2020/12/28 17:42:12 CPU: 68.805%, Mem: 719.305MB
2020/12/28 17:42:22 CPU: 50.056%, Mem: 1095.645MB
2020/12/28 17:42:32 CPU: 24.116%, Mem: 1204.625MB
2020/12/28 17:42:42 CPU: 17.365%, Mem: 1265.137MB
2020/12/28 17:42:52 CPU: 13.493%, Mem: 1373.691MB
2020/12/28 17:43:02 CPU: 11.961%, Mem: 1370.391MB
2020/12/28 17:43:12 CPU: 12.627%, Mem: 1386.016MB
2020/12/28 17:43:22 CPU: 11.292%, Mem: 1395.773MB
2020/12/28 17:43:32 CPU: 9.679%, Mem: 1406.008MB
2020/12/28 17:43:42 CPU: 9.041%, Mem: 1414.945MB
2020/12/28 17:43:52 CPU: 12.767%, Mem: 1427.512MB
2020/12/28 17:44:02 CPU: 10.489%, Mem: 1525.133MB
2020/12/28 17:44:12 CPU: 8.065%, Mem: 1547.289MB
2020/12/28 17:44:22 CPU: 8.404%, Mem: 1564.160MB
2020/12/28 17:44:32 CPU: 7.556%, Mem: 1550.742MB
2020/12/28 17:44:42 CPU: 6.965%, Mem: 1551.152MB
2020/12/28 17:44:52 CPU: 7.517%, Mem: 1556.168MB
2020/12/28 17:45:02 CPU: 6.223%, Mem: 1517.473MB
2020/12/28 17:45:12 CPU: 6.606%, Mem: 1528.398MB
2020/12/28 17:45:22 CPU: 6.640%, Mem: 1533.340MB
2020/12/28 17:45:32 CPU: 5.650%, Mem: 1540.832MB
2020/12/28 17:45:42 CPU: 6.802%, Mem: 1549.254MB
2020/12/28 17:45:52 CPU: 6.082%, Mem: 1543.012MB
2020/12/28 17:46:02 CPU: 8.436%, Mem: 1561.805MB
2020/12/28 17:46:12 CPU: 6.160%, Mem: 1603.430MB
2020/12/28 17:46:22 CPU: 5.799%, Mem: 1611.719MB
2020/12/28 17:46:32 CPU: 5.492%, Mem: 1607.176MB
2020/12/28 17:46:42 CPU: 5.297%, Mem: 1620.074MB
2020/12/28 17:46:52 CPU: 4.701%, Mem: 1624.859MB
2020/12/28 17:47:02 CPU: 4.871%, Mem: 1608.625MB
2020/12/28 17:47:04 all request workers stopped
2020/12/28 17:47:04 Sent 95152 transactions in 5m0.069230339s
2020/12/28 17:47:04 RPS: 317.100
2020/12/28 17:47:04 All transactions have been sent successfully
2020/12/28 17:47:04 RPC Errors: 0 / 0.000%
2020/12/28 17:47:04 sender worker stopped
2020/12/28 17:47:12 CPU: 1.670%, Mem: 1606.434MB
2020/12/28 17:47:22 CPU: 0.548%, Mem: 1601.559MB
2020/12/28 17:47:32 CPU: 0.542%, Mem: 1588.051MB
2020/12/28 17:47:42 CPU: 0.539%, Mem: 1586.672MB
2020/12/28 17:47:52 CPU: 0.535%, Mem: 1585.195MB
2020/12/28 17:48:02 CPU: 1.466%, Mem: 1583.816MB
2020/12/28 17:48:12 CPU: 0.994%, Mem: 1583.496MB
2020/12/28 17:48:22 CPU: 0.438%, Mem: 1584.766MB
2020/12/28 17:48:32 CPU: 0.547%, Mem: 1582.051MB
2020/12/28 17:48:42 CPU: 0.528%, Mem: 1582.473MB
2020/12/28 17:48:52 CPU: 0.431%, Mem: 1585.152MB
2020/12/28 17:49:02 CPU: 0.064%, Mem: 1591.641MB
2020/12/28 17:49:12 CPU: 0.067%, Mem: 1589.688MB
2020/12/28 17:49:22 CPU: 0.546%, Mem: 1588.758MB
2020/12/28 17:49:32 CPU: 0.555%, Mem: 1588.727MB
2020/12/28 17:49:42 CPU: 0.270%, Mem: 1587.781MB
2020/12/28 17:49:52 CPU: 0.536%, Mem: 1587.996MB
2020/12/28 17:50:02 CPU: 0.996%, Mem: 1588.949MB
2020/12/28 17:50:12 CPU: 4.485%, Mem: 1619.805MB
2020/12/28 17:50:17 (#6/50000) 50000 Tx's in 490101 ms 102.019788 tps
2020/12/28 17:50:22 CPU: 3.931%, Mem: 2098.523MB
2020/12/28 17:50:27 empty block: 7
2020/12/28 17:50:27 (#7/0) 0 Tx's in 15479 ms 0.000000 tps
2020/12/28 17:50:32 CPU: 0.283%, Mem: 2315.426MB
2020/12/28 17:50:42 empty block: 8
2020/12/28 17:50:42 (#8/0) 0 Tx's in 15030 ms 0.000000 tps
8 minutes to block. Another option probably is to reduce the mempool size:
--- a/.docker/ir/sharp.protocol.template.yml
+++ b/.docker/ir/sharp.protocol.template.yml
@@ -32,7 +32,7 @@
- #@ "node:{}".format(data.values.nodes_info[i].node_port)
#@ else:
MillisecondsPerBlock: 5000
- MemoryPoolMaxTransactions: 50000
+ MemoryPoolMaxTransactions: 5000
ValidatorsCount: #@ nodes_count - 2
StandbyCommittee:
#@ for j in range(2,nodes_count):
Even though it's quite small nodes tend to spend more time accepting blocks than expected:
2020/12/28 17:52:22 Init 10 workers / 5m0s time limit (1000000 txs will try to send)
2020/12/28 17:52:23 empty block: 8
2020/12/28 17:52:24 CPU: 14.354%, Mem: 574.500MB
2020/12/28 17:52:34 CPU: 11.448%, Mem: 648.332MB
2020/12/28 17:52:44 CPU: 7.422%, Mem: 672.039MB
2020/12/28 17:52:54 CPU: 5.812%, Mem: 656.695MB
2020/12/28 17:52:57 (#9/5000) 5000 Tx's in 36710 ms 136.202670 tps
2020/12/28 17:53:02 (#10/5000) 5000 Tx's in 5057 ms 988.728495 tps
2020/12/28 17:53:04 CPU: 34.194%, Mem: 825.410MB
2020/12/28 17:53:14 CPU: 10.985%, Mem: 914.621MB
2020/12/28 17:53:15 (#11/5000) 5000 Tx's in 13591 ms 367.890516 tps
2020/12/28 17:53:24 CPU: 19.733%, Mem: 978.160MB
2020/12/28 17:53:34 CPU: 7.153%, Mem: 997.266MB
2020/12/28 17:53:44 CPU: 6.024%, Mem: 982.492MB
2020/12/28 17:53:52 (#12/5000) 5000 Tx's in 36939 ms 135.358293 tps
2020/12/28 17:53:54 CPU: 24.134%, Mem: 993.477MB
2020/12/28 17:54:04 CPU: 12.411%, Mem: 1076.871MB
2020/12/28 17:54:05 (#13/525) 525 Tx's in 5057 ms 103.816492 tps
2020/12/28 17:54:14 CPU: 6.585%, Mem: 1113.574MB
2020/12/28 17:54:24 CPU: 5.552%, Mem: 1122.895MB
2020/12/28 17:54:34 CPU: 4.075%, Mem: 1123.410MB
2020/12/28 17:54:42 (#14/5000) 5000 Tx's in 43160 ms 115.848007 tps
2020/12/28 17:54:45 CPU: 20.976%, Mem: 1146.699MB
2020/12/28 17:54:55 CPU: 10.399%, Mem: 1157.391MB
Platform:
- OS: openSUSE Leap 15.2
- Docker: 19.03.11
- Version: Neo 3 preview4.
Has this problem been solved?
I've not checked post-preview5 builds yet, but most probably it's still there.
@neo-project/ngd-shanghai Can you confirm it?
Just checked RC1 (after nspcc-dev/neo-bench#67) and it's still there:
2021/03/19 18:54:57 Init 10 workers / 5m0s time limit (1000000 txs will try to send)
2021/03/19 18:54:58 empty block: 7
2021/03/19 18:55:00 CPU: 28.424%, Mem: 660.133MB
2021/03/19 18:55:02 (#8/10625) 10625 Tx's in 5032 ms 2111.486486 tps
2021/03/19 18:55:10 CPU: 76.794%, Mem: 1323.953MB
2021/03/19 18:55:11 (#9/21472) 21472 Tx's in 5352 ms 4011.958146 tps
2021/03/19 18:55:17 (#10/24811) 24811 Tx's in 5453 ms 4549.972492 tps
2021/03/19 18:55:20 CPU: 73.968%, Mem: 2457.020MB
2021/03/19 18:55:20 (#11/18331) 18331 Tx's in 5532 ms 3313.629790 tps
2021/03/19 18:55:25 (#12/15330) 15330 Tx's in 5437 ms 2819.569616 tps
2021/03/19 18:55:30 CPU: 77.548%, Mem: 2710.785MB
2021/03/19 18:55:33 (#13/15020) 15020 Tx's in 7635 ms 1967.256058 tps
2021/03/19 18:55:40 CPU: 78.569%, Mem: 3101.770MB
2021/03/19 18:55:44 (#14/28470) 28470 Tx's in 5462 ms 5212.376419 tps
2021/03/19 18:55:50 CPU: 74.616%, Mem: 3711.117MB
2021/03/19 18:56:00 CPU: 58.177%, Mem: 3849.504MB
2021/03/19 18:56:10 CPU: 36.901%, Mem: 4075.902MB
2021/03/19 18:56:20 CPU: 27.870%, Mem: 4176.941MB
2021/03/19 18:56:30 CPU: 17.924%, Mem: 4368.148MB
2021/03/19 18:56:40 CPU: 20.382%, Mem: 4425.441MB
2021/03/19 18:56:50 CPU: 10.888%, Mem: 4427.516MB
2021/03/19 18:57:00 CPU: 10.493%, Mem: 4432.066MB
2021/03/19 18:57:10 CPU: 8.855%, Mem: 4483.203MB
2021/03/19 18:57:20 CPU: 9.511%, Mem: 4409.781MB
2021/03/19 18:57:30 CPU: 10.179%, Mem: 4399.324MB
2021/03/19 18:57:40 CPU: 10.112%, Mem: 4399.906MB
2021/03/19 18:57:50 CPU: 8.400%, Mem: 4176.969MB
2021/03/19 18:58:00 CPU: 8.004%, Mem: 4129.297MB
2021/03/19 18:58:10 CPU: 8.764%, Mem: 4114.543MB
2021/03/19 18:58:20 CPU: 7.323%, Mem: 4115.473MB
2021/03/19 18:58:30 CPU: 8.061%, Mem: 4136.352MB
2021/03/19 18:58:40 CPU: 6.480%, Mem: 4225.160MB
2021/03/19 18:58:50 CPU: 7.903%, Mem: 4222.117MB
2021/03/19 18:59:00 CPU: 6.126%, Mem: 4231.148MB
2021/03/19 18:59:10 CPU: 5.992%, Mem: 4230.012MB
2021/03/19 18:59:20 CPU: 5.639%, Mem: 4238.121MB
2021/03/19 18:59:30 CPU: 6.645%, Mem: 4239.855MB
2021/03/19 18:59:40 CPU: 7.262%, Mem: 4194.809MB
2021/03/19 18:59:50 CPU: 5.994%, Mem: 4195.391MB
2021/03/19 18:59:57 Sent 226322 transactions in 5m0.089448378s
2021/03/19 18:59:57 RPS: 754.182
2021/03/19 18:59:57 All transactions have been sent successfully
2021/03/19 18:59:57 RPC Errors: 0 / 0.000%
2021/03/19 18:59:57 all request workers stopped
2021/03/19 18:59:57 sender worker stopped
2021/03/19 19:00:00 CPU: 4.048%, Mem: 4202.633MB
2021/03/19 19:00:10 CPU: 0.881%, Mem: 4195.277MB
2021/03/19 19:00:20 CPU: 1.015%, Mem: 4191.145MB
2021/03/19 19:00:30 CPU: 0.540%, Mem: 4182.266MB
2021/03/19 19:00:40 CPU: 0.422%, Mem: 4182.090MB
2021/03/19 19:00:50 CPU: 0.544%, Mem: 4182.180MB
2021/03/19 19:01:00 CPU: 0.546%, Mem: 4182.410MB
2021/03/19 19:01:10 CPU: 0.047%, Mem: 4186.051MB
2021/03/19 19:01:20 CPU: 11.726%, Mem: 4793.660MB
2021/03/19 19:01:25 (#15/50000) 50000 Tx's in 339077 ms 147.459132 tps
2021/03/19 19:01:25 empty block: 16
2021/03/19 19:01:25 (#16/0) 0 Tx's in 6480 ms 0.000000 tps
2021/03/19 19:01:26 empty block: 17
2021/03/19 19:01:26 (#17/0) 0 Tx's in 5273 ms 0.000000 tps
2021/03/19 19:01:30 empty block: 18
2021/03/19 19:01:30 (#18/0) 0 Tx's in 5041 ms 0.000000 tps
2021/03/19 19:01:30 CPU: 11.233%, Mem: 5469.957MB
2021/03/19 19:01:39 empty block: 19
2021/03/19 19:01:39 (#19/0) 0 Tx's in 5027 ms 0.000000 tps
2021/03/19 19:01:40 CPU: 1.346%, Mem: 5315.074MB
2021/03/19 19:01:40 empty block: 20
2021/03/19 19:01:40 (#20/0) 0 Tx's in 5030 ms 0.000000 tps
2021/03/19 19:01:47 empty block: 21
2021/03/19 19:01:47 (#21/0) 0 Tx's in 5032 ms 0.000000 tps
2021/03/19 19:01:50 CPU: 0.470%, Mem: 5312.016MB
2021/03/19 19:01:50 empty block: 22
2021/03/19 19:01:50 (#22/0) 0 Tx's in 5031 ms 0.000000 tps
I think the following things happen when transactions amount reach mempool limit:
(1) The speaker's mempool reach the limit (50000) and publishes a prepare request that contains hashes of these 50000 transactions
(2) Other consensus nodes are in similar situation, but the transactions in their mempool might vary because of communication between nodes & timestamp is different when they receive prepare request.
(3) So it's possible that not the majority of nodes have all the transactions in the prepare request. They will try to seek missing transactions, but fail due to busy network or even some of the transactions have already been dropped by other nodes
(4) Thus consensus fails & change view.
This is what likely to happen when transaction amount reaches mempool limit.
But this phenomena should occur in case that MaxTransactionsPerBlock is bigger than or equal to mempool capacity (for example in your cases). This is not likely to happen when MaxTransactionsPerBlock is much less than mempool capacity I suppose.