lodestar icon indicating copy to clipboard operation
lodestar copied to clipboard

Update libp2p to ESM version

Open wemeetagain opened this issue 2 years ago • 16 comments

Motivation

We want to get the latest libp2p, gossipsub, etc updates.

Description

TODO:

  • more type errors
  • [x] https://github.com/multiformats/js-multiaddr/pull/247 - not needed
  • [x] https://github.com/libp2p/js-libp2p-interfaces/issues/227
  • [x] merge conflicts
  • [ ] configure libp2p to not break

Closes #4140 Closes https://github.com/ChainSafe/lodestar/issues/4043

wemeetagain avatar Jun 03 '22 21:06 wemeetagain

@wemeetagain : is there an estimate on when this will be completed? I'm asking because am hoping Lodestar can use the "canary" version of js-libp2p to provide feedback on the resource protection mechanisms that have recently been introduced.

BigLep avatar Jun 28 '22 02:06 BigLep

Performance Report

✔️ no performance regression detected

Full benchmark results
Benchmark suite Current: 270628128add88c9eeb74850e086a16e60561269 Previous: 1beeb2bc71ea78e88ea89636d3db2682bb00c0e1 Ratio
getPubkeys - index2pubkey - req 1000 vs - 250000 vc 3.0490 ms/op 2.2950 ms/op 1.33
getPubkeys - validatorsArr - req 1000 vs - 250000 vc 96.803 us/op 74.553 us/op 1.30
BLS verify - blst-native 2.4159 ms/op 1.8533 ms/op 1.30
BLS verifyMultipleSignatures 3 - blst-native 4.8576 ms/op 3.7984 ms/op 1.28
BLS verifyMultipleSignatures 8 - blst-native 10.583 ms/op 8.1662 ms/op 1.30
BLS verifyMultipleSignatures 32 - blst-native 39.519 ms/op 29.656 ms/op 1.33
BLS aggregatePubkeys 32 - blst-native 52.465 us/op 39.155 us/op 1.34
BLS aggregatePubkeys 128 - blst-native 202.31 us/op 152.87 us/op 1.32
getAttestationsForBlock 126.50 ms/op 85.728 ms/op 1.48
isKnown best case - 1 super set check 584.00 ns/op 421.00 ns/op 1.39
isKnown normal case - 2 super set checks 555.00 ns/op 413.00 ns/op 1.34
isKnown worse case - 16 super set checks 553.00 ns/op 410.00 ns/op 1.35
CheckpointStateCache - add get delete 12.007 us/op 8.7520 us/op 1.37
validate gossip signedAggregateAndProof - struct 5.8516 ms/op 4.2644 ms/op 1.37
validate gossip attestation - struct 2.6393 ms/op 2.0377 ms/op 1.30
pickEth1Vote - no votes 2.7208 ms/op 2.1636 ms/op 1.26
pickEth1Vote - max votes 25.624 ms/op 19.963 ms/op 1.28
pickEth1Vote - Eth1Data hashTreeRoot value x2048 15.346 ms/op 11.365 ms/op 1.35
pickEth1Vote - Eth1Data hashTreeRoot tree x2048 26.404 ms/op 20.873 ms/op 1.26
pickEth1Vote - Eth1Data fastSerialize value x2048 1.9266 ms/op 1.4984 ms/op 1.29
pickEth1Vote - Eth1Data fastSerialize tree x2048 16.742 ms/op 13.271 ms/op 1.26
bytes32 toHexString 1.3290 us/op 1.0160 us/op 1.31
bytes32 Buffer.toString(hex) 797.00 ns/op 740.00 ns/op 1.08
bytes32 Buffer.toString(hex) from Uint8Array 1.2040 us/op 1.0110 us/op 1.19
bytes32 Buffer.toString(hex) + 0x 833.00 ns/op 737.00 ns/op 1.13
Object access 1 prop 0.49200 ns/op 0.36700 ns/op 1.34
Map access 1 prop 0.36200 ns/op 0.29700 ns/op 1.22
Object get x1000 17.814 ns/op 18.194 ns/op 0.98
Map get x1000 1.0130 ns/op 1.0350 ns/op 0.98
Object set x1000 127.57 ns/op 117.68 ns/op 1.08
Map set x1000 80.423 ns/op 68.539 ns/op 1.17
Return object 10000 times 0.40130 ns/op 0.37780 ns/op 1.06
Throw Error 10000 times 8.3322 us/op 5.9567 us/op 1.40
enrSubnets - fastDeserialize 64 bits 3.3560 us/op 2.6100 us/op 1.29
enrSubnets - ssz BitVector 64 bits 808.00 ns/op 748.00 ns/op 1.08
enrSubnets - fastDeserialize 4 bits 435.00 ns/op 381.00 ns/op 1.14
enrSubnets - ssz BitVector 4 bits 833.00 ns/op 764.00 ns/op 1.09
prioritizePeers score -10:0 att 32-0.1 sync 2-0 116.00 us/op 95.978 us/op 1.21
prioritizePeers score 0:0 att 32-0.25 sync 2-0.25 156.63 us/op 124.82 us/op 1.25
prioritizePeers score 0:0 att 32-0.5 sync 2-0.5 291.37 us/op 213.17 us/op 1.37
prioritizePeers score 0:0 att 64-0.75 sync 4-0.75 668.00 us/op 467.01 us/op 1.43
prioritizePeers score 0:0 att 64-1 sync 4-1 608.12 us/op 464.00 us/op 1.31
RateTracker 1000000 limit, 1 obj count per request 229.73 ns/op 198.88 ns/op 1.16
RateTracker 1000000 limit, 2 obj count per request 161.02 ns/op 153.91 ns/op 1.05
RateTracker 1000000 limit, 4 obj count per request 140.08 ns/op 129.29 ns/op 1.08
RateTracker 1000000 limit, 8 obj count per request 128.65 ns/op 117.24 ns/op 1.10
RateTracker with prune 5.4210 us/op 4.3870 us/op 1.24
array of 16000 items push then shift 5.7012 us/op 3.1716 us/op 1.80
LinkedList of 16000 items push then shift 23.408 ns/op 18.411 ns/op 1.27
array of 16000 items push then pop 287.55 ns/op 240.67 ns/op 1.19
LinkedList of 16000 items push then pop 19.055 ns/op 17.484 ns/op 1.09
array of 24000 items push then shift 8.4241 us/op 4.5562 us/op 1.85
LinkedList of 24000 items push then shift 22.135 ns/op 22.420 ns/op 0.99
array of 24000 items push then pop 261.41 ns/op 212.00 ns/op 1.23
LinkedList of 24000 items push then pop 21.404 ns/op 19.634 ns/op 1.09
intersect bitArray bitLen 8 15.382 ns/op 11.762 ns/op 1.31
intersect array and set length 8 231.81 ns/op 161.62 ns/op 1.43
intersect bitArray bitLen 128 84.892 ns/op 61.951 ns/op 1.37
intersect array and set length 128 2.8094 us/op 2.2117 us/op 1.27
Buffer.concat 32 items 2.7360 ns/op 1.9180 ns/op 1.43
pass gossip attestations to forkchoice per slot 7.5348 ms/op 4.0512 ms/op 1.86
computeDeltas 6.0913 ms/op 6.3272 ms/op 0.96
computeProposerBoostScoreFromBalances 915.75 us/op 907.80 us/op 1.01
altair processAttestation - 250000 vs - 7PWei normalcase 5.6117 ms/op 3.9559 ms/op 1.42
altair processAttestation - 250000 vs - 7PWei worstcase 8.1975 ms/op 5.7794 ms/op 1.42
altair processAttestation - setStatus - 1/6 committees join 288.96 us/op 216.77 us/op 1.33
altair processAttestation - setStatus - 1/3 committees join 543.17 us/op 416.59 us/op 1.30
altair processAttestation - setStatus - 1/2 committees join 849.38 us/op 583.20 us/op 1.46
altair processAttestation - setStatus - 2/3 committees join 1.0761 ms/op 747.93 us/op 1.44
altair processAttestation - setStatus - 4/5 committees join 1.4851 ms/op 1.0349 ms/op 1.44
altair processAttestation - setStatus - 100% committees join 1.8126 ms/op 1.2305 ms/op 1.47
altair processBlock - 250000 vs - 7PWei normalcase 33.919 ms/op 29.260 ms/op 1.16
altair processBlock - 250000 vs - 7PWei normalcase hashState 47.411 ms/op 40.503 ms/op 1.17
altair processBlock - 250000 vs - 7PWei worstcase 120.89 ms/op 83.742 ms/op 1.44
altair processBlock - 250000 vs - 7PWei worstcase hashState 125.70 ms/op 108.82 ms/op 1.16
phase0 processBlock - 250000 vs - 7PWei normalcase 5.1877 ms/op 4.3816 ms/op 1.18
phase0 processBlock - 250000 vs - 7PWei worstcase 60.410 ms/op 49.045 ms/op 1.23
altair processEth1Data - 250000 vs - 7PWei normalcase 1.1887 ms/op 814.30 us/op 1.46
Tree 40 250000 create 1.0853 s/op 830.11 ms/op 1.31
Tree 40 250000 get(125000) 341.21 ns/op 288.04 ns/op 1.18
Tree 40 250000 set(125000) 2.8838 us/op 2.9523 us/op 0.98
Tree 40 250000 toArray() 38.926 ms/op 33.907 ms/op 1.15
Tree 40 250000 iterate all - toArray() + loop 39.657 ms/op 34.791 ms/op 1.14
Tree 40 250000 iterate all - get(i) 145.47 ms/op 114.74 ms/op 1.27
MutableVector 250000 create 16.893 ms/op 18.215 ms/op 0.93
MutableVector 250000 get(125000) 14.166 ns/op 14.802 ns/op 0.96
MutableVector 250000 set(125000) 843.06 ns/op 717.62 ns/op 1.17
MutableVector 250000 toArray() 7.4086 ms/op 7.8917 ms/op 0.94
MutableVector 250000 iterate all - toArray() + loop 7.9347 ms/op 7.9987 ms/op 0.99
MutableVector 250000 iterate all - get(i) 3.5385 ms/op 3.4406 ms/op 1.03
Array 250000 create 7.0889 ms/op 7.3742 ms/op 0.96
Array 250000 clone - spread 3.8526 ms/op 4.2339 ms/op 0.91
Array 250000 get(125000) 1.6610 ns/op 1.7150 ns/op 0.97
Array 250000 set(125000) 1.7860 ns/op 1.7020 ns/op 1.05
Array 250000 iterate all - loop 153.20 us/op 167.87 us/op 0.91
effectiveBalanceIncrements clone Uint8Array 300000 356.97 us/op 85.549 us/op 4.17
effectiveBalanceIncrements clone MutableVector 300000 803.00 ns/op 630.00 ns/op 1.27
effectiveBalanceIncrements rw all Uint8Array 300000 314.02 us/op 252.73 us/op 1.24
effectiveBalanceIncrements rw all MutableVector 300000 227.69 ms/op 169.24 ms/op 1.35
phase0 afterProcessEpoch - 250000 vs - 7PWei 224.87 ms/op 201.31 ms/op 1.12
phase0 beforeProcessEpoch - 250000 vs - 7PWei 84.716 ms/op 80.299 ms/op 1.06
altair processEpoch - mainnet_e81889 677.45 ms/op 602.13 ms/op 1.13
mainnet_e81889 - altair beforeProcessEpoch 191.61 ms/op 156.50 ms/op 1.22
mainnet_e81889 - altair processJustificationAndFinalization 74.695 us/op 28.365 us/op 2.63
mainnet_e81889 - altair processInactivityUpdates 11.347 ms/op 11.819 ms/op 0.96
mainnet_e81889 - altair processRewardsAndPenalties 105.32 ms/op 96.361 ms/op 1.09
mainnet_e81889 - altair processRegistryUpdates 15.393 us/op 4.0670 us/op 3.78
mainnet_e81889 - altair processSlashings 4.6940 us/op 972.00 ns/op 4.83
mainnet_e81889 - altair processEth1DataReset 4.1230 us/op 898.00 ns/op 4.59
mainnet_e81889 - altair processEffectiveBalanceUpdates 2.6356 ms/op 2.4184 ms/op 1.09
mainnet_e81889 - altair processSlashingsReset 25.014 us/op 9.2450 us/op 2.71
mainnet_e81889 - altair processRandaoMixesReset 25.251 us/op 6.9030 us/op 3.66
mainnet_e81889 - altair processHistoricalRootsUpdate 3.9270 us/op 1.1710 us/op 3.35
mainnet_e81889 - altair processParticipationFlagUpdates 14.944 us/op 4.5380 us/op 3.29
mainnet_e81889 - altair processSyncCommitteeUpdates 3.6410 us/op 764.00 ns/op 4.77
mainnet_e81889 - altair afterProcessEpoch 221.67 ms/op 198.28 ms/op 1.12
phase0 processEpoch - mainnet_e58758 616.17 ms/op 545.99 ms/op 1.13
mainnet_e58758 - phase0 beforeProcessEpoch 327.02 ms/op 245.62 ms/op 1.33
mainnet_e58758 - phase0 processJustificationAndFinalization 67.292 us/op 24.110 us/op 2.79
mainnet_e58758 - phase0 processRewardsAndPenalties 91.132 ms/op 132.24 ms/op 0.69
mainnet_e58758 - phase0 processRegistryUpdates 34.264 us/op 12.119 us/op 2.83
mainnet_e58758 - phase0 processSlashings 3.9410 us/op 982.00 ns/op 4.01
mainnet_e58758 - phase0 processEth1DataReset 4.0030 us/op 1.0220 us/op 3.92
mainnet_e58758 - phase0 processEffectiveBalanceUpdates 2.0074 ms/op 2.3950 ms/op 0.84
mainnet_e58758 - phase0 processSlashingsReset 17.223 us/op 4.6280 us/op 3.72
mainnet_e58758 - phase0 processRandaoMixesReset 27.886 us/op 5.0820 us/op 5.49
mainnet_e58758 - phase0 processHistoricalRootsUpdate 4.7070 us/op 888.00 ns/op 5.30
mainnet_e58758 - phase0 processParticipationRecordUpdates 25.906 us/op 5.0020 us/op 5.18
mainnet_e58758 - phase0 afterProcessEpoch 168.22 ms/op 173.60 ms/op 0.97
phase0 processEffectiveBalanceUpdates - 250000 normalcase 2.4557 ms/op 2.6266 ms/op 0.93
phase0 processEffectiveBalanceUpdates - 250000 worstcase 0.5 2.7366 ms/op 3.4871 ms/op 0.78
altair processInactivityUpdates - 250000 normalcase 67.569 ms/op 41.647 ms/op 1.62
altair processInactivityUpdates - 250000 worstcase 55.950 ms/op 51.814 ms/op 1.08
phase0 processRegistryUpdates - 250000 normalcase 27.994 us/op 8.4310 us/op 3.32
phase0 processRegistryUpdates - 250000 badcase_full_deposits 540.09 us/op 455.39 us/op 1.19
phase0 processRegistryUpdates - 250000 worstcase 0.5 279.11 ms/op 218.11 ms/op 1.28
altair processRewardsAndPenalties - 250000 normalcase 111.94 ms/op 132.83 ms/op 0.84
altair processRewardsAndPenalties - 250000 worstcase 166.70 ms/op 86.961 ms/op 1.92
phase0 getAttestationDeltas - 250000 normalcase 15.251 ms/op 13.380 ms/op 1.14
phase0 getAttestationDeltas - 250000 worstcase 14.338 ms/op 13.583 ms/op 1.06
phase0 processSlashings - 250000 worstcase 6.7549 ms/op 5.4716 ms/op 1.23
altair processSyncCommitteeUpdates - 250000 344.39 ms/op 279.04 ms/op 1.23
BeaconState.hashTreeRoot - No change 633.00 ns/op 479.00 ns/op 1.32
BeaconState.hashTreeRoot - 1 full validator 71.372 us/op 64.361 us/op 1.11
BeaconState.hashTreeRoot - 32 full validator 829.38 us/op 678.28 us/op 1.22
BeaconState.hashTreeRoot - 512 full validator 8.8300 ms/op 6.8405 ms/op 1.29
BeaconState.hashTreeRoot - 1 validator.effectiveBalance 102.53 us/op 87.513 us/op 1.17
BeaconState.hashTreeRoot - 32 validator.effectiveBalance 1.6028 ms/op 1.1719 ms/op 1.37
BeaconState.hashTreeRoot - 512 validator.effectiveBalance 19.592 ms/op 15.904 ms/op 1.23
BeaconState.hashTreeRoot - 1 balances 84.721 us/op 62.226 us/op 1.36
BeaconState.hashTreeRoot - 32 balances 714.47 us/op 615.06 us/op 1.16
BeaconState.hashTreeRoot - 512 balances 7.5394 ms/op 5.8641 ms/op 1.29
BeaconState.hashTreeRoot - 250000 balances 124.75 ms/op 94.954 ms/op 1.31
aggregationBits - 2048 els - zipIndexesInBitList 35.332 us/op 36.236 us/op 0.98
regular array get 100000 times 62.570 us/op 67.419 us/op 0.93
wrappedArray get 100000 times 61.818 us/op 67.445 us/op 0.92
arrayWithProxy get 100000 times 37.099 ms/op 28.922 ms/op 1.28
ssz.Root.equals 602.00 ns/op 533.00 ns/op 1.13
byteArrayEquals 599.00 ns/op 507.00 ns/op 1.18
shuffle list - 16384 els 13.165 ms/op 11.080 ms/op 1.19
shuffle list - 250000 els 185.06 ms/op 166.95 ms/op 1.11
processSlot - 1 slots 17.990 us/op 14.083 us/op 1.28
processSlot - 32 slots 2.4390 ms/op 1.7584 ms/op 1.39
getEffectiveBalanceIncrementsZeroInactive - 250000 vs - 7PWei 473.10 us/op 399.59 us/op 1.18
getCommitteeAssignments - req 1 vs - 250000 vc 5.5776 ms/op 5.3835 ms/op 1.04
getCommitteeAssignments - req 100 vs - 250000 vc 8.2227 ms/op 7.3234 ms/op 1.12
getCommitteeAssignments - req 1000 vs - 250000 vc 9.1009 ms/op 7.7859 ms/op 1.17
RootCache.getBlockRootAtSlot - 250000 vs - 7PWei 10.710 ns/op 10.840 ns/op 0.99
state getBlockRootAtSlot - 250000 vs - 7PWei 1.5647 us/op 1.0767 us/op 1.45
computeProposers - vc 250000 23.492 ms/op 17.763 ms/op 1.32
computeEpochShuffling - vc 250000 180.65 ms/op 169.58 ms/op 1.07
getNextSyncCommittee - vc 250000 357.28 ms/op 271.46 ms/op 1.32

by benchmarkbot/action

github-actions[bot] avatar Jun 29 '22 21:06 github-actions[bot]

This is ready for initial review, the last remaining items to fix are related to updating stream limits per protocol. Running this locally shows that there are many errors related to 'too many streams' per protocol.

wemeetagain avatar Jun 30 '22 18:06 wemeetagain

Seeing gossip logs like:

  libp2p:gossipsub Cannot send RPC to 16Uiu2HAm9KWBLBBNMg7fvQNjYEkja8XgRLaT59AdvuYj3WH8DwTS as there is no open stream to it available +0ms
  libp2p:gossipsub:score unexpected delivery: message from 16Uiu2HAmE3P9zMXa33mafrvKemkPfMLLwFc9wpVwMFCQErfx6FPa was first seen 2505 ago and has delivery status NaN +0ms
  libp2p:gossipsub Cannot send RPC to 16Uiu2HAm2zjUvbtntJfrpXz2ehpv9BhHRqjgziYWbbi3nKoGsEoh as there is no open stream to it available +0ms
  libp2p:gossipsub Cannot send RPC to 16Uiu2HAm9KWBLBBNMg7fvQNjYEkja8XgRLaT59AdvuYj3WH8DwTS as there is no open stream to it available +0ms
  libp2p:gossipsub:score unexpected delivery: message from 16Uiu2HAmE3P9zMXa33mafrvKemkPfMLLwFc9wpVwMFCQErfx6FPa was first seen 2525 ago and has delivery status NaN +1ms
  libp2p:gossipsub Cannot send RPC to 16Uiu2HAm2zjUvbtntJfrpXz2ehpv9BhHRqjgziYWbbi3nKoGsEoh as there is no open stream to it available +1ms
  libp2p:gossipsub Cannot send RPC to 16Uiu2HAm9KWBLBBNMg7fvQNjYEkja8XgRLaT59AdvuYj3WH8DwTS as there is no open stream to it available +0ms
  libp2p:gossipsub:score unexpected delivery: message from 16Uiu2HAmE3P9zMXa33mafrvKemkPfMLLwFc9wpVwMFCQErfx6FPa was first seen 2671 ago and has delivery status NaN +1ms
  libp2p:gossipsub Cannot send RPC to 16Uiu2HAm2zjUvbtntJfrpXz2ehpv9BhHRqjgziYWbbi3nKoGsEoh as there is no open stream to it available +1ms
  libp2p:gossipsub Cannot send RPC to 16Uiu2HAm9KWBLBBNMg7fvQNjYEkja8XgRLaT59AdvuYj3WH8DwTS as there is no open stream to it available +0ms
  libp2p:gossipsub:score unexpected delivery: message from 16Uiu2HAmE3P9zMXa33mafrvKemkPfMLLwFc9wpVwMFCQErfx6FPa was first seen 3083 ago and has delivery status NaN +1ms
  libp2p:gossipsub Cannot send RPC to 16Uiu2HAm2zjUvbtntJfrpXz2ehpv9BhHRqjgziYWbbi3nKoGsEoh as there is no open stream to it available +1ms
  libp2p:gossipsub Cannot send RPC to 16Uiu2HAm9KWBLBBNMg7fvQNjYEkja8XgRLaT59AdvuYj3WH8DwTS as there is no open stream to it available +0ms
  libp2p:gossipsub:score unexpected delivery: message from 16Uiu2HAmE3P9zMXa33mafrvKemkPfMLLwFc9wpVwMFCQErfx6FPa was first seen 3152 ago and has delivery status NaN +0ms
  libp2p:gossipsub Cannot send RPC to 16Uiu2HAm2zjUvbtntJfrpXz2ehpv9BhHRqjgziYWbbi3nKoGsEoh as there is no open stream to it available +0ms
  libp2p:gossipsub Cannot send RPC to 16Uiu2HAm9KWBLBBNMg7fvQNjYEkja8XgRLaT59AdvuYj3WH8DwTS as there is no open stream to it available +0ms
  libp2p:gossipsub:score unexpected delivery: message from 16Uiu2HAmE3P9zMXa33mafrvKemkPfMLLwFc9wpVwMFCQErfx6FPa was first seen 3212 ago and has delivery status NaN +0ms
  libp2p:gossipsub Cannot send RPC to 16Uiu2HAm2zjUvbtntJfrpXz2ehpv9BhHRqjgziYWbbi3nKoGsEoh as there is no open stream to it available +1ms
  libp2p:gossipsub Cannot send RPC to 16Uiu2HAm9KWBLBBNMg7fvQNjYEkja8XgRLaT59AdvuYj3WH8DwTS as there is no open stream to it available +0ms
  libp2p:gossipsub:score unexpected delivery: message from 16Uiu2HAmE3P9zMXa33mafrvKemkPfMLLwFc9wpVwMFCQErfx6FPa was first seen 3225 ago and has delivery status NaN +1ms
  libp2p:gossipsub Cannot send RPC to 16Uiu2HAm2zjUvbtntJfrpXz2ehpv9BhHRqjgziYWbbi3nKoGsEoh as there is no open stream to it available +0ms
  libp2p:gossipsub Cannot send RPC to 16Uiu2HAm9KWBLBBNMg7fvQNjYEkja8XgRLaT59AdvuYj3WH8DwTS as there is no open stream to it available +0ms
  libp2p:gossipsub:score unexpected delivery: message from 16Uiu2HAmE3P9zMXa33mafrvKemkPfMLLwFc9wpVwMFCQErfx6FPa was first seen 3254 ago and has delivery status NaN +1ms
  libp2p:gossipsub Cannot send RPC to 16Uiu2HAm2zjUvbtntJfrpXz2ehpv9BhHRqjgziYWbbi3nKoGsEoh as there is no open stream to it available +1ms
  libp2p:gossipsub Cannot send RPC to 16Uiu2HAm9KWBLBBNMg7fvQNjYEkja8XgRLaT59AdvuYj3WH8DwTS as there is no open stream to it available +0ms

wemeetagain avatar Jun 30 '22 20:06 wemeetagain

gossipsub appears to be working after https://github.com/ChainSafe/js-libp2p-gossipsub/pull/298

Last issue is regarding req/resp behavior:

With this PR: Screenshot from 2022-07-09 14-22-02

Current stable: Screenshot from 2022-07-09 14-23-10

wemeetagain avatar Jul 09 '22 19:07 wemeetagain

looking at the logs:

this PR: cat logs | grep method=ping

Jul-09 19:11:54.736[NETWORK]         debug: Req  sending request method=ping, encoding=ssz_snappy, client=Nimbus, peer=16...Cqkngc, requestId=9097, body=0
Jul-09 19:11:54.738[NETWORK]         debug: Req  request sent method=ping, encoding=ssz_snappy, client=Nimbus, peer=16...Cqkngc, requestId=9097
Jul-09 19:11:55.322[NETWORK]       verbose: Req  done method=ping, encoding=ssz_snappy, client=Nimbus, peer=16...Cqkngc, requestId=9097, numResponse=1
Jul-09 19:11:58.431[NETWORK]         debug: Req  dialing peer method=ping, encoding=ssz_snappy, client=Unknown, peer=16...x9EBmM, requestId=9098
Jul-09 19:11:58.448[NETWORK]         debug: Req  dialing peer method=ping, encoding=ssz_snappy, client=Unknown, peer=16...oFAqDy, requestId=9100
Jul-09 19:11:58.485[NETWORK]         debug: Req  dialing peer method=ping, encoding=ssz_snappy, client=Unknown, peer=16...eoCHbq, requestId=9102
Jul-09 19:11:58.503[NETWORK]         debug: Req  dialing peer method=ping, encoding=ssz_snappy, client=Unknown, peer=16...Yux7ye, requestId=9104
Jul-09 19:11:58.556[NETWORK]         debug: Req  sending request method=ping, encoding=ssz_snappy, client=Unknown, peer=16...x9EBmM, requestId=9098, body=0
Jul-09 19:11:58.558[NETWORK]         debug: Req  request sent method=ping, encoding=ssz_snappy, client=Unknown, peer=16...x9EBmM, requestId=9098
Jul-09 19:11:58.561[NETWORK]       verbose: Req  error method=ping, encoding=ssz_snappy, client=Unknown, peer=16...x9EBmM, requestId=9098 code=REQUEST_ERROR_EMPTY_RESPONSE
Jul-09 19:11:58.586[NETWORK]         debug: Req  sending request method=ping, encoding=ssz_snappy, client=Unknown, peer=16...oFAqDy, requestId=9100, body=0
Jul-09 19:11:58.587[NETWORK]         debug: Req  request sent method=ping, encoding=ssz_snappy, client=Unknown, peer=16...oFAqDy, requestId=9100
Jul-09 19:11:58.590[NETWORK]       verbose: Req  error method=ping, encoding=ssz_snappy, client=Unknown, peer=16...oFAqDy, requestId=9100 code=REQUEST_ERROR_EMPTY_RESPONSE
Jul-09 19:11:58.610[NETWORK]         debug: Req  dialing peer method=ping, encoding=ssz_snappy, client=Unknown, peer=16...kreBey, requestId=9106
Jul-09 19:11:58.623[NETWORK]         debug: Req  sending request method=ping, encoding=ssz_snappy, client=Unknown, peer=16...eoCHbq, requestId=9102, body=0
Jul-09 19:11:58.625[NETWORK]         debug: Req  request sent method=ping, encoding=ssz_snappy, client=Unknown, peer=16...eoCHbq, requestId=9102
Jul-09 19:11:58.627[NETWORK]       verbose: Req  error method=ping, encoding=ssz_snappy, client=Unknown, peer=16...eoCHbq, requestId=9102 code=REQUEST_ERROR_EMPTY_RESPONSE
Jul-09 19:11:58.668[NETWORK]         debug: Req  sending request method=ping, encoding=ssz_snappy, client=Unknown, peer=16...Yux7ye, requestId=9104, body=0
Jul-09 19:11:58.671[NETWORK]         debug: Req  request sent method=ping, encoding=ssz_snappy, client=Unknown, peer=16...Yux7ye, requestId=9104
Jul-09 19:11:58.674[NETWORK]       verbose: Req  error method=ping, encoding=ssz_snappy, client=Unknown, peer=16...Yux7ye, requestId=9104 code=REQUEST_ERROR_EMPTY_RESPONSE
Jul-09 19:11:58.736[NETWORK]         debug: Req  dialing peer method=ping, encoding=ssz_snappy, client=Unknown, peer=16...3C524j, requestId=9108
Jul-09 19:11:58.857[NETWORK]         debug: Req  dialing peer method=ping, encoding=ssz_snappy, client=Unknown, peer=16...pAkwoA, requestId=9110
Jul-09 19:11:58.870[NETWORK]         debug: Req  dialing peer method=ping, encoding=ssz_snappy, client=Unknown, peer=16...TUjGiF, requestId=9112
Jul-09 19:11:58.925[NETWORK]         debug: Req  dialing peer method=ping, encoding=ssz_snappy, client=Unknown, peer=16...8GBh6W, requestId=9114
Jul-09 19:11:58.933[NETWORK]         debug: Req  sending request method=ping, encoding=ssz_snappy, client=Unknown, peer=16...3C524j, requestId=9108, body=0
Jul-09 19:11:58.937[NETWORK]         debug: Req  request sent method=ping, encoding=ssz_snappy, client=Unknown, peer=16...3C524j, requestId=9108
Jul-09 19:11:58.939[NETWORK]       verbose: Req  error method=ping, encoding=ssz_snappy, client=Unknown, peer=16...3C524j, requestId=9108 code=REQUEST_ERROR_EMPTY_RESPONSE
Jul-09 19:11:58.947[NETWORK]         debug: Req  sending request method=ping, encoding=ssz_snappy, client=Unknown, peer=16...kreBey, requestId=9106, body=0
Jul-09 19:11:58.950[NETWORK]         debug: Req  request sent method=ping, encoding=ssz_snappy, client=Unknown, peer=16...kreBey, requestId=9106
Jul-09 19:11:58.959[NETWORK]       verbose: Req  error method=ping, encoding=ssz_snappy, client=Unknown, peer=16...kreBey, requestId=9106 code=REQUEST_ERROR_EMPTY_RESPONSE
Jul-09 19:11:59.030[NETWORK]         debug: Req  dialing peer method=ping, encoding=ssz_snappy, client=Unknown, peer=16...BFTf6V, requestId=9116
Jul-09 19:11:59.072[NETWORK]         debug: Req  sending request method=ping, encoding=ssz_snappy, client=Unknown, peer=16...8GBh6W, requestId=9114, body=0
Jul-09 19:11:59.073[NETWORK]         debug: Req  request sent method=ping, encoding=ssz_snappy, client=Unknown, peer=16...8GBh6W, requestId=9114
Jul-09 19:11:59.081[NETWORK]         debug: Req  dialing peer method=ping, encoding=ssz_snappy, client=Unknown, peer=16...PV9yXS, requestId=9118
Jul-09 19:11:59.160[NETWORK]         debug: Req  dialing peer method=ping, encoding=ssz_snappy, client=Unknown, peer=16...FQNgj9, requestId=9120
Jul-09 19:11:59.174[NETWORK]         debug: Req  sending request method=ping, encoding=ssz_snappy, client=Unknown, peer=16...pAkwoA, requestId=9110, body=0
Jul-09 19:11:59.177[NETWORK]         debug: Req  request sent method=ping, encoding=ssz_snappy, client=Unknown, peer=16...pAkwoA, requestId=9110
Jul-09 19:11:59.181[NETWORK]       verbose: Req  error method=ping, encoding=ssz_snappy, client=Unknown, peer=16...pAkwoA, requestId=9110 code=REQUEST_ERROR_EMPTY_RESPONSE
Jul-09 19:11:59.189[NETWORK]         debug: Req  sending request method=ping, encoding=ssz_snappy, client=Unknown, peer=16...TUjGiF, requestId=9112, body=0
Jul-09 19:11:59.191[NETWORK]         debug: Req  request sent method=ping, encoding=ssz_snappy, client=Unknown, peer=16...TUjGiF, requestId=9112
Jul-09 19:11:59.193[NETWORK]       verbose: Req  error method=ping, encoding=ssz_snappy, client=Unknown, peer=16...TUjGiF, requestId=9112 code=REQUEST_ERROR_EMPTY_RESPONSE
Jul-09 19:11:59.249[NETWORK]         debug: Req  dialing peer method=ping, encoding=ssz_snappy, client=Unknown, peer=16...vQ6bQN, requestId=9122
Jul-09 19:11:59.348[NETWORK]         debug: Req  sending request method=ping, encoding=ssz_snappy, client=Unknown, peer=16...BFTf6V, requestId=9116, body=0
Jul-09 19:11:59.349[NETWORK]         debug: Req  request sent method=ping, encoding=ssz_snappy, client=Unknown, peer=16...BFTf6V, requestId=9116
Jul-09 19:11:59.356[NETWORK]       verbose: Req  error method=ping, encoding=ssz_snappy, client=Unknown, peer=16...BFTf6V, requestId=9116 code=REQUEST_ERROR_EMPTY_RESPONSE
Jul-09 19:11:59.362[NETWORK]       verbose: Req  done method=ping, encoding=ssz_snappy, client=Unknown, peer=16...8GBh6W, requestId=9114, numResponse=1
Jul-09 19:11:59.398[NETWORK]         debug: Req  sending request method=ping, encoding=ssz_snappy, client=Unknown, peer=16...vQ6bQN, requestId=9122, body=0
Jul-09 19:11:59.400[NETWORK]         debug: Req  request sent method=ping, encoding=ssz_snappy, client=Unknown, peer=16...vQ6bQN, requestId=9122
Jul-09 19:11:59.416[NETWORK]         debug: Req  sending request method=ping, encoding=ssz_snappy, client=Unknown, peer=16...PV9yXS, requestId=9118, body=0
Jul-09 19:11:59.418[NETWORK]         debug: Req  request sent method=ping, encoding=ssz_snappy, client=Unknown, peer=16...PV9yXS, requestId=9118
Jul-09 19:11:59.422[NETWORK]       verbose: Req  error method=ping, encoding=ssz_snappy, client=Unknown, peer=16...PV9yXS, requestId=9118 code=REQUEST_ERROR_EMPTY_RESPONSE
Jul-09 19:11:59.525[NETWORK]         debug: Req  sending request method=ping, encoding=ssz_snappy, client=Unknown, peer=16...FQNgj9, requestId=9120, body=0
Jul-09 19:11:59.534[NETWORK]         debug: Req  request sent method=ping, encoding=ssz_snappy, client=Unknown, peer=16...FQNgj9, requestId=9120
Jul-09 19:11:59.539[NETWORK]       verbose: Req  error method=ping, encoding=ssz_snappy, client=Unknown, peer=16...FQNgj9, requestId=9120 code=REQUEST_ERROR_EMPTY_RESPONSE
Jul-09 19:11:59.834[NETWORK]       verbose: Req  done method=ping, encoding=ssz_snappy, client=Unknown, peer=16...vQ6bQN, requestId=9122, numResponse=1
Jul-09 19:12:03.096[NETWORK]         debug: Req  dialing peer method=ping, encoding=ssz_snappy, client=Prysm, peer=16...AmMsUF, requestId=9126
Jul-09 19:12:03.242[NETWORK]         debug: Req  sending request method=ping, encoding=ssz_snappy, client=Prysm, peer=16...AmMsUF, requestId=9126, body=0
Jul-09 19:12:03.242[NETWORK]         debug: Req  request sent method=ping, encoding=ssz_snappy, client=Prysm, peer=16...AmMsUF, requestId=9126
Jul-09 19:12:03.385[NETWORK]       verbose: Req  done method=ping, encoding=ssz_snappy, client=Prysm, peer=16...AmMsUF, requestId=9126, numResponse=1
Jul-09 19:12:07.576[NETWORK]         debug: Resp received request method=ping, client=Prysm, peer=16...nRwVeq, requestId=4994, body=1
Jul-09 19:12:07.578[NETWORK]       verbose: Resp done method=ping, client=Prysm, peer=16...nRwVeq, requestId=4994
Jul-09 19:12:09.412[NETWORK]         debug: Resp received request method=ping, client=Lighthouse, peer=16...HQRasZ, requestId=4995, body=1858
Jul-09 19:12:09.413[NETWORK]       verbose: Resp done method=ping, client=Lighthouse, peer=16...HQRasZ, requestId=4995
Jul-09 19:12:13.119[NETWORK]         debug: Req  dialing peer method=ping, encoding=ssz_snappy, client=Nimbus, peer=16...poaCnX, requestId=9127
Jul-09 19:12:13.779[NETWORK]         debug: Req  sending request method=ping, encoding=ssz_snappy, client=Nimbus, peer=16...poaCnX, requestId=9127, body=0
Jul-09 19:12:13.782[NETWORK]         debug: Req  request sent method=ping, encoding=ssz_snappy, client=Nimbus, peer=16...poaCnX, requestId=9127
Jul-09 19:12:15.208[NETWORK]       verbose: Req  done method=ping, encoding=ssz_snappy, client=Nimbus, peer=16...poaCnX, requestId=9127, numResponse=1
Jul-09 19:12:17.387[NETWORK]         debug: Resp received request method=ping, client=Prysm, peer=16...X1CM45, requestId=4997, body=1
Jul-09 19:12:17.540[NETWORK]       verbose: Resp done method=ping, client=Prysm, peer=16...X1CM45, requestId=4997
Jul-09 19:12:18.188[NETWORK]         debug: Resp received request method=ping, client=Prysm, peer=16...AmMsUF, requestId=4999, body=16
Jul-09 19:12:18.189[NETWORK]       verbose: Resp done method=ping, client=Prysm, peer=16...AmMsUF, requestId=4999
Jul-09 19:12:20.391[NETWORK]         debug: Resp received request method=ping, client=Unknown, peer=16...vQ6bQN, requestId=5000, body=535
Jul-09 19:12:20.394[NETWORK]       verbose: Resp done method=ping, client=Unknown, peer=16...vQ6bQN, requestId=5000

current stable: cat logs | grep method=ping

Jul-07 23:59:26.186[NETWORK]         debug: Resp received request method=ping, client=Teku, peer=16...kZKAAL, requestId=617659, body=1
Jul-07 23:59:26.187[NETWORK]       verbose: Resp done method=ping, client=Teku, peer=16...kZKAAL, requestId=617659
Jul-07 23:59:28.256[NETWORK]         debug: Resp received request method=ping, client=Lighthouse, peer=16...tXTAHY, requestId=617660, body=1865
Jul-07 23:59:28.257[NETWORK]       verbose: Resp done method=ping, client=Lighthouse, peer=16...tXTAHY, requestId=617660
Jul-07 23:59:29.667[NETWORK]         debug: Req  dialing peer method=ping, encoding=ssz_snappy, client=Nimbus, peer=16...b81hCx, requestId=223310
Jul-07 23:59:29.668[NETWORK]         debug: Req  dialing peer method=ping, encoding=ssz_snappy, client=Prysm, peer=16...TNFJH1, requestId=223311
Jul-07 23:59:29.763[NETWORK]         debug: Req  sending request method=ping, encoding=ssz_snappy, client=Nimbus, peer=16...b81hCx, requestId=223310, body=7
Jul-07 23:59:29.764[NETWORK]         debug: Req  request sent method=ping, encoding=ssz_snappy, client=Nimbus, peer=16...b81hCx, requestId=223310
Jul-07 23:59:29.826[NETWORK]       verbose: Req  done method=ping, encoding=ssz_snappy, client=Nimbus, peer=16...b81hCx, requestId=223310, numResponse=1
Jul-07 23:59:29.886[NETWORK]         debug: Req  sending request method=ping, encoding=ssz_snappy, client=Prysm, peer=16...TNFJH1, requestId=223311, body=7
Jul-07 23:59:29.887[NETWORK]         debug: Req  request sent method=ping, encoding=ssz_snappy, client=Prysm, peer=16...TNFJH1, requestId=223311
Jul-07 23:59:29.920[NETWORK]         debug: Resp received request method=ping, client=Teku, peer=16...tRMb5C, requestId=617662, body=9
Jul-07 23:59:29.920[NETWORK]       verbose: Resp done method=ping, client=Teku, peer=16...tRMb5C, requestId=617662
Jul-07 23:59:30.105[NETWORK]       verbose: Req  done method=ping, encoding=ssz_snappy, client=Prysm, peer=16...TNFJH1, requestId=223311, numResponse=1
Jul-07 23:59:30.233[NETWORK]         debug: Resp received request method=ping, client=Teku, peer=16...Y4Sk7n, requestId=617663, body=6
Jul-07 23:59:30.234[NETWORK]       verbose: Resp done method=ping, client=Teku, peer=16...Y4Sk7n, requestId=617663
Jul-07 23:59:30.788[NETWORK]         debug: Resp received request method=ping, client=Teku, peer=16...RHxnGh, requestId=617664, body=1
Jul-07 23:59:30.789[NETWORK]       verbose: Resp done method=ping, client=Teku, peer=16...RHxnGh, requestId=617664
Jul-07 23:59:31.574[NETWORK]         debug: Resp received request method=ping, client=Prysm, peer=16...W4FzZB, requestId=617666, body=8
Jul-07 23:59:31.575[NETWORK]       verbose: Resp done method=ping, client=Prysm, peer=16...W4FzZB, requestId=617666
Jul-07 23:59:31.843[NETWORK]         debug: Resp received request method=ping, client=Prysm, peer=16...xxC1bz, requestId=617668, body=22
Jul-07 23:59:31.844[NETWORK]       verbose: Resp done method=ping, client=Prysm, peer=16...xxC1bz, requestId=617668
Jul-07 23:59:35.007[NETWORK]         debug: Resp received request method=ping, client=Teku, peer=16...ta9ahW, requestId=617669, body=45
Jul-07 23:59:35.009[NETWORK]       verbose: Resp done method=ping, client=Teku, peer=16...ta9ahW, requestId=617669
Jul-07 23:59:36.229[NETWORK]         debug: Resp received request method=ping, client=Teku, peer=16...kZKAAL, requestId=617670, body=1
Jul-07 23:59:36.233[NETWORK]       verbose: Resp done method=ping, client=Teku, peer=16...kZKAAL, requestId=617670
Jul-07 23:59:39.668[NETWORK]         debug: Req  dialing peer method=ping, encoding=ssz_snappy, client=Prysm, peer=16...T5F4ms, requestId=223313
Jul-07 23:59:39.867[NETWORK]         debug: Req  sending request method=ping, encoding=ssz_snappy, client=Prysm, peer=16...T5F4ms, requestId=223313, body=7
Jul-07 23:59:39.868[NETWORK]         debug: Req  request sent method=ping, encoding=ssz_snappy, client=Prysm, peer=16...T5F4ms, requestId=223313
Jul-07 23:59:39.922[NETWORK]         debug: Resp received request method=ping, client=Teku, peer=16...tRMb5C, requestId=617671, body=9
Jul-07 23:59:39.923[NETWORK]       verbose: Resp done method=ping, client=Teku, peer=16...tRMb5C, requestId=617671
Jul-07 23:59:40.109[NETWORK]       verbose: Req  done method=ping, encoding=ssz_snappy, client=Prysm, peer=16...T5F4ms, requestId=223313, numResponse=1
Jul-07 23:59:40.231[NETWORK]         debug: Resp received request method=ping, client=Teku, peer=16...Y4Sk7n, requestId=617672, body=6
Jul-07 23:59:40.232[NETWORK]       verbose: Resp done method=ping, client=Teku, peer=16...Y4Sk7n, requestId=617672
Jul-07 23:59:41.673[NETWORK]         debug: Resp received request method=ping, client=Teku, peer=16...RHxnGh, requestId=617673, body=1
Jul-07 23:59:41.678[NETWORK]       verbose: Resp done method=ping, client=Teku, peer=16...RHxnGh, requestId=617673
Jul-07 23:59:44.960[NETWORK]         debug: Resp received request method=ping, client=Teku, peer=16...ta9ahW, requestId=617674, body=45
Jul-07 23:59:44.961[NETWORK]       verbose: Resp done method=ping, client=Teku, peer=16...ta9ahW, requestId=617674
Jul-07 23:59:46.185[NETWORK]         debug: Resp received request method=ping, client=Teku, peer=16...kZKAAL, requestId=617675, body=1
Jul-07 23:59:46.186[NETWORK]       verbose: Resp done method=ping, client=Teku, peer=16...kZKAAL, requestId=617675
Jul-07 23:59:48.252[NETWORK]         debug: Resp received request method=ping, client=Lighthouse, peer=16...tXTAHY, requestId=617676, body=1865
Jul-07 23:59:48.253[NETWORK]       verbose: Resp done method=ping, client=Lighthouse, peer=16...tXTAHY, requestId=617676
Jul-07 23:59:49.964[NETWORK]         debug: Resp received request method=ping, client=Teku, peer=16...tRMb5C, requestId=617678, body=9
Jul-07 23:59:49.966[NETWORK]       verbose: Resp done method=ping, client=Teku, peer=16...tRMb5C, requestId=617678
Jul-07 23:59:50.235[NETWORK]         debug: Resp received request method=ping, client=Teku, peer=16...Y4Sk7n, requestId=617679, body=6
Jul-07 23:59:50.236[NETWORK]       verbose: Resp done method=ping, client=Teku, peer=16...Y4Sk7n, requestId=617679
Jul-07 23:59:50.691[NETWORK]         debug: Resp received request method=ping, client=Teku, peer=16...RHxnGh, requestId=617680, body=1
Jul-07 23:59:50.692[NETWORK]       verbose: Resp done method=ping, client=Teku, peer=16...RHxnGh, requestId=617680
Jul-07 23:59:55.016[NETWORK]         debug: Resp received request method=ping, client=Teku, peer=16...ta9ahW, requestId=617682, body=45
Jul-07 23:59:55.018[NETWORK]       verbose: Resp done method=ping, client=Teku, peer=16...ta9ahW, requestId=617682
Jul-07 23:59:55.595[NETWORK]         debug: Resp received request method=ping, client=Prysm, peer=16...KnuCCp, requestId=617684, body=191
Jul-07 23:59:55.595[NETWORK]       verbose: Resp done method=ping, client=Prysm, peer=16...KnuCCp, requestId=617684
Jul-07 23:59:56.230[NETWORK]         debug: Resp received request method=ping, client=Teku, peer=16...kZKAAL, requestId=617685, body=1
Jul-07 23:59:56.231[NETWORK]       verbose: Resp done method=ping, client=Teku, peer=16...kZKAAL, requestId=617685
Jul-07 23:59:59.670[NETWORK]         debug: Req  dialing peer method=ping, encoding=ssz_snappy, client=Prysm, peer=16...sGmKgx, requestId=223317
Jul-07 23:59:59.920[NETWORK]         debug: Resp received request method=ping, client=Teku, peer=16...tRMb5C, requestId=617686, body=9
Jul-07 23:59:59.921[NETWORK]       verbose: Resp done method=ping, client=Teku, peer=16...tRMb5C, requestId=617686

wemeetagain avatar Jul 10 '22 00:07 wemeetagain

I'm seeing a memory leak, not sure if its related to this PR Screenshot from 2022-07-12 09-51-41

wemeetagain avatar Jul 12 '22 14:07 wemeetagain

Also the peer count seems off. Connected peers vs # of peers that gossipsub is tracking is different

Screenshot from 2022-07-12 10-01-25

wemeetagain avatar Jul 12 '22 15:07 wemeetagain

Some more metrics

Screenshot from 2022-07-12 10-05-36 Screenshot from 2022-07-12 10-06-05 Screenshot from 2022-07-12 10-07-11 Screenshot from 2022-07-12 10-08-39

wemeetagain avatar Jul 12 '22 15:07 wemeetagain

Searching thru 'Too many' in the logs shows many many failed inbound streams. It turns out that most/all of the errors look like this:

2022-07-12T11:09:11.012Z libp2p:upgrader outbound: incoming stream opened on /eth2/beacon_chain/req/goodbye/1/ssz_snappy
2022-07-12T11:09:11.013Z libp2p:upgrader:error Error: Too many incoming protocol streams
    at file:///home/cayman/Code/lodestar/node_modules/libp2p/src/upgrader.ts:345:31
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)

wemeetagain avatar Jul 13 '22 14:07 wemeetagain

Looking at logs/metrics on contabo-8

Consistently low peer count (~20ish). Choppy metrics (metrics update timeout?)

This one is weird because we already had connected to this peer. Maybe nothing (peer just turned off their node). Just with Prysm.

Jul-16 00:00:54.651[NETWORK]       ^[[36mverbose^[[39m: Req  error method=ping, encoding=ssz_snappy, client=Prysm, peer=16...6YeGaz, requestId=55618 code=REQUEST_ERROR_DIAL_ERROR, error=All promises were rejected
AggregateError: All promises were rejected
Error: REQUEST_ERROR_DIAL_ERROR
    at file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/request/index.ts:100:15
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at sendRequest (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/request/index.ts:86:20)
    at ReqResp.sendRequest (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/reqResp.ts:156:22)
    at ReqResp.ping (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/reqResp.ts:102:12)
    at PeerManager.requestPing (file:///usr/src/lodestar/packages/beacon-node/src/network/peers/peerManager.ts:348:25)
Jul-16 00:00:54.652[NETWORK]       ^[[36mverbose^[[39m: Req  error method=status, encoding=ssz_snappy, client=Prysm, peer=16...6YeGaz, requestId=55622 code=REQUEST_ERROR_DIAL_ERROR, error=All promises were rejected
AggregateError: All promises were rejected
Error: REQUEST_ERROR_DIAL_ERROR
    at file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/request/index.ts:100:15
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at sendRequest (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/request/index.ts:86:20)
    at ReqResp.sendRequest (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/reqResp.ts:156:22)
    at ReqResp.status (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/reqResp.ts:94:12)
    at PeerManager.requestStatus (file:///usr/src/lodestar/packages/beacon-node/src/network/peers/peerManager.ts:360:27)
    at async Promise.all (index 0)
    at PeerManager.requestStatusMany (file:///usr/src/lodestar/packages/beacon-node/src/network/peers/peerManager.ts:369:7)

Several stream resets like this from prysm

Jul-16 00:01:05.864[NETWORK]       ^[[36mverbose^[[39m: Req  error method=ping, encoding=ssz_snappy, client=Prysm, peer=16...okdd6X, requestId=55631 stream reset
Error: stream reset
    at Object.reset (file:///usr/src/lodestar/node_modules/@libp2p/mplex/src/stream.ts:140:27)
    at MplexStreamMuxer._handleIncoming (file:///usr/src/lodestar/node_modules/@libp2p/mplex/src/mplex.ts:298:16)
    at file:///usr/src/lodestar/node_modules/@libp2p/mplex/src/mplex.ts:191:20
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at MplexStreamMuxer.sink (file:///usr/src/lodestar/node_modules/@libp2p/mplex/src/mplex.ts:185:9)

Still a few 'too many outgoing' status streams to a peer

Jul-16 00:01:06.735[NETWORK]       ^[[36mverbose^[[39m: Req  error method=status, encoding=ssz_snappy, client=Prysm, peer=16...Dkr5sw, requestId=55634 code=REQUEST_ERROR_DIAL_ERROR, error=Too many outgoing protocol streams
Error: Too many outgoing protocol streams
    at ConnectionImpl.newStream [as _newStream] (file:///usr/src/lodestar/node_modules/libp2p/src/upgrader.ts:392:27)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at ConnectionImpl.newStream (file:///usr/src/lodestar/node_modules/@libp2p/connection/src/index.ts:121:20)
    at EventTarget.dialProtocol (file:///usr/src/lodestar/node_modules/libp2p/src/libp2p.ts:421:12)
    at file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/request/index.ts:89:22
    at withTimeout (file:///usr/src/lodestar/packages/utils/src/timeout.ts:19:12)
    at sendRequest (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/request/index.ts:86:20)
    at ReqResp.sendRequest (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/reqResp.ts:156:22)
    at ReqResp.status (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/reqResp.ts:94:12)
Error: REQUEST_ERROR_DIAL_ERROR
    at file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/request/index.ts:100:15
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at sendRequest (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/request/index.ts:86:20)
    at ReqResp.sendRequest (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/reqResp.ts:156:22)
    at ReqResp.status (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/reqResp.ts:94:12)
    at PeerManager.requestStatus (file:///usr/src/lodestar/packages/beacon-node/src/network/peers/peerManager.ts:360:27)
    at async Promise.all (index 1)
    at PeerManager.requestStatusMany (file:///usr/src/lodestar/packages/beacon-node/src/network/peers/peerManager.ts:369:7)

Sometimes we're not getting ping or status responses?

Jul-16 00:01:39.887[NETWORK]       ^[[36mverbose^[[39m: Req  error method=ping, encoding=ssz_snappy, client=Unknown, peer=16...5DzgaP, requestId=55655 code=REQUEST_ERROR_EMPTY_RESPONSE
Error: REQUEST_ERROR_EMPTY_RESPONSE
    at file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/request/collectResponses.ts:20:13
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at runNextTicks (node:internal/process/task_queues:65:3)
    at listOnTimeout (node:internal/timers:526:9)
    at processTimers (node:internal/timers:500:7)
    at withTimeout (file:///usr/src/lodestar/packages/utils/src/timeout.ts:19:12)
    at sendRequest (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/request/index.ts:133:25)
    at ReqResp.sendRequest (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/reqResp.ts:156:22)
    at ReqResp.ping (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/reqResp.ts:102:12)
Jul-16 00:01:39.888[NETWORK]       ^[[36mverbose^[[39m: Req  error method=status, encoding=ssz_snappy, client=Unknown, peer=16...5DzgaP, requestId=55656 code=REQUEST_ERROR_EMPTY_RESPONSE
Error: REQUEST_ERROR_EMPTY_RESPONSE
    at file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/request/collectResponses.ts:20:13
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at runNextTicks (node:internal/process/task_queues:65:3)
    at listOnTimeout (node:internal/timers:526:9)
    at processTimers (node:internal/timers:500:7)
    at withTimeout (file:///usr/src/lodestar/packages/utils/src/timeout.ts:19:12)
    at sendRequest (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/request/index.ts:133:25)
    at ReqResp.sendRequest (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/reqResp.ts:156:22)
    at ReqResp.status (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/reqResp.ts:94:12)

Similar thing with a goodbye

Jul-16 00:05:57.605[NETWORK]       ^[[36mverbose^[[39m: Req  error method=goodbye, encoding=ssz_snappy, client=Teku, peer=16...EEC8aU, requestId=55798 code=REQUEST_ERROR_EMPTY_RESPONSE
Error: REQUEST_ERROR_EMPTY_RESPONSE
    at file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/request/collectResponses.ts:20:13
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at withTimeout (file:///usr/src/lodestar/packages/utils/src/timeout.ts:19:12)
    at sendRequest (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/request/index.ts:133:25)
    at ReqResp.sendRequest (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/reqResp.ts:156:22)
    at ReqResp.goodbye (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/reqResp.ts:98:5)
    at PeerManager.goodbyeAndDisconnect (file:///usr/src/lodestar/packages/beacon-node/src/network/peers/peerManager.ts:604:7)
Jul-16 00:05:57.605[NETWORK]       ^[[36mverbose^[[39m: Failed to send goodbye peer=16...EEC8aU method=goodbye, encoding=ssz_snappy, peer=16...EEC8aU, code=REQUEST_ERROR_EMPTY_RESPONSE
Error: REQUEST_ERROR_EMPTY_RESPONSE
    at sendRequest (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/request/index.ts:171:13)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at ReqResp.sendRequest (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/reqResp.ts:156:22)
    at ReqResp.goodbye (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/reqResp.ts:98:5)
    at PeerManager.goodbyeAndDisconnect (file:///usr/src/lodestar/packages/beacon-node/src/network/peers/peerManager.ts:604:7)

And this one is strange, we've made connection, then trying to update metadata

Jul-16 00:08:29.373[NETWORK]       ^[[36mverbose^[[39m: Req  error method=metadata, encoding=ssz_snappy, client=Teku, peer=16...ugsGLm, requestId=55953 code=REQUEST_ERROR_DIAL_ERROR, error=stream ended before 1 bytes became available
Error: stream ended before 1 bytes became available
    at file:///usr/src/lodestar/node_modules/it-reader/src/index.ts:44:9
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at runNextTicks (node:internal/process/task_queues:65:3)
    at listOnTimeout (node:internal/timers:526:9)
    at processTimers (node:internal/timers:500:7)
    at Object.next (file:///usr/src/lodestar/node_modules/@libp2p/multistream-select/src/multistream.ts:47:23)
    at abortable (file:///usr/src/lodestar/node_modules/abortable-iterator/src/index.ts:42:18)
    at decoder (file:///usr/src/lodestar/node_modules/it-length-prefixed/src/decode.ts:107:22)
    at first (/usr/src/lodestar/node_modules/it-first/index.js:11:20)
Error: REQUEST_ERROR_DIAL_ERROR
    at file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/request/index.ts:100:15
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at runNextTicks (node:internal/process/task_queues:65:3)
    at listOnTimeout (node:internal/timers:526:9)
    at processTimers (node:internal/timers:500:7)
    at sendRequest (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/request/index.ts:86:20)
    at ReqResp.sendRequest (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/reqResp.ts:156:22)
    at ReqResp.metadata (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/reqResp.ts:108:12)
    at PeerManager.requestMetadata (file:///usr/src/lodestar/packages/beacon-node/src/network/peers/peerManager.ts:340:29)

A few of this error just with Lighthouse

Jul-16 00:09:41.549[NETWORK]       ^[[36mverbose^[[39m: Resp error method=ping, client=Lighthouse, peer=16...MLTEPx, requestId=42983 Timeout
Error: Timeout
    at timeoutPromise (file:///usr/src/lodestar/packages/utils/src/timeout.ts:15:11)
    at runMicrotasks (<anonymous>)
    at runNextTicks (node:internal/process/task_queues:61:5)
    at processTimers (node:internal/timers:497:9)
    at withTimeout (file:///usr/src/lodestar/packages/utils/src/timeout.ts:19:12)
    at requestHandlerSource (file:///usr/src/lodestar/packages/beacon-node/src/network/reqresp/response/index.ts:58:29)
    at file:///usr/src/lodestar/node_modules/it-handshake/src/index.ts:29:5
    at abortable (file:///usr/src/lodestar/node_modules/abortable-iterator/src/index.ts:42:18)
    at Object.sink (file:///usr/src/lodestar/node_modules/@libp2p/mplex/src/stream.ts:164:26)

wemeetagain avatar Jul 16 '22 14:07 wemeetagain

here's a snippet of a CPU profile of a running node:

Self     Total      Function                          Library
35.32 %  35.32 %    concat             concat.js:1    uint8arrays
35.32 %  35.32 %      slice            index.js:121   uint8arraylist
 9.75 %   9.75 %        ReadHandlers   decode.js:11   it-length-prefixed
 8.49 %   8.49 %        decoder        decode.js:59   it-length-prefixed
 8.39 %   8.39 %        (anonymous)    index.js:28    it-handshake
 4.29 %   4.29 %        write          decode.js:8    mplex
 4.29 %   4.29 %        bytesDecode    bytes.js:13    protons-runtime

35% of run time is dedicated to concatenating uint8arrays :(

wemeetagain avatar Jul 22 '22 13:07 wemeetagain

Current status:

Testing against the latest version of libp2p Local sanity checks passing (run the node, wait for peers) Deployed this branch to the feat1 group :crossed_fingers:

Will fix unit tests and merge conflicts now

wemeetagain avatar Aug 15 '22 14:08 wemeetagain

Enabled libp2p metrics and attached a new dashboard exposing the metrics libp2p_dashboard

wemeetagain avatar Aug 23 '22 20:08 wemeetagain

Thanks for sharing @wemeetagain . No expectation to respond as don't want to create extra work, but if there's a place can point to, I'd be interested in knowing what the remaining work is and if there's anything needed from js-libp2p at this point.

BigLep avatar Aug 23 '22 21:08 BigLep

I don't think there's anything remaining on js-libp2p's side that we need.

Performance is now mostly on-par or better than our unstable and stable branches. We've redeployed 67da674 across a group in our fleet (roughly 24 hrs ago) and are monitoring metrics to iron out the last remaining things.

The big thing we're working through:

  • Metrics show external memory (eg: ArrayBuffer/Buffer memory, native code memory, etc) grows, like a memory leak, but occasionally and inconsistently drops back to normal levels. Screenshot from 2022-08-24 11-15-11 We're still figuring out the root cause. (I'm hoping that https://github.com/ChainSafe/js-libp2p-gossipsub/pull/325 will fix this but we'll see)

wemeetagain avatar Aug 24 '22 16:08 wemeetagain

cc @achingbrain we finally did it buddy

wemeetagain avatar Sep 27 '22 17:09 wemeetagain

Some different gossipsub metrics I notice after 6h-7h (in a 12h timeframe):

  • Heartbeat is way faster Screen Shot 2022-09-28 at 08 46 58

  • Score functions are called less Screen Shot 2022-09-28 at 08 22 41

  • We send out way less RPC control messages Screen Shot 2022-09-28 at 08 23 43

  • We receive more RPC PRUNE messages ===> need to investigate! Screen Shot 2022-09-28 at 08 24 34

  • More gossip attestation messages are dropped ==> something to notice

Screen Shot 2022-09-28 at 08 27 52
  • Besides, epoch transition is faster ==> is this coincidence? Screen Shot 2022-09-28 at 08 30 55

  • SubmitPoolAttestations request/response time is bigger ===> this is a concern as we already have the issue #4600 after 1.1.0 Screen Shot 2022-09-28 at 08 35 08

twoeths avatar Sep 28 '22 01:09 twoeths