teku icon indicating copy to clipboard operation
teku copied to clipboard

Teku + Besu, attestations randomly missed since the merge

Open totoCZ opened this issue 2 years ago • 17 comments

besu 22.7.2 teku 22.9.0

Sep 15 15:06:00 eth-consensus teku[522445]: 2022-09-15 15:06:00.673 INFO - Attestation performance: epoch 146952, expected 6, produced 6, included 5 (83%), distance 1 / 1.00 / 1>

teku log https://termbin.com/573u besu log https://termbin.com/41ic

It's not clear why produced is 6 and included 5.

The end result is 80% performance.

image

totoCZ avatar Sep 15 '22 15:09 totoCZ

Similar symptoms for me with the same client combo. I noticed that there are suspiciously many empty slots in the Teku log:

15:27:15.076 INFO - Slot Event *** Slot: 4702634, Block: 4a74296200b1970641ffe28e52972ef1e8a1bd8777db5fb03021e34e00c37cd8, Justified: 146956, Finalized: 146955, Peers: 100 15:27:27.117 INFO - Slot Event *** Slot: 4702635, Block: ... empty, Justified: 146956, Finalized: 146955, Peers: 100 15:27:39.125 INFO - Slot Event *** Slot: 4702636, Block: d7b9bf7534cbd83a5d5c2b20822a13e0b65279c8e330aa05902de64d6b298839, Justified: 146956, Finalized: 146955, Peers: 99 15:27:51.117 INFO - Slot Event *** Slot: 4702637, Block: d1f59414c48170f16098c6811981aabdc118d4063068118315ba62be3be19b03, Justified: 146956, Finalized: 146955, Peers: 100 15:28:03.123 INFO - Slot Event *** Slot: 4702638, Block: ab616d4623d339bbad3961fc18f7c93afa6cfcf3be5c5d1381fc08ee993ae3fe, Justified: 146956, Finalized: 146955, Peers: 100 15:28:15.123 INFO - Slot Event *** Slot: 4702639, Block: ... empty, Justified: 146956, Finalized: 146955, Peers: 100 15:28:27.121 INFO - Slot Event *** Slot: 4702640, Block: 774ecc2744e98800144a225101ad7d2128a6c434d99c05bfc561313ff56aa49a, Justified: 146956, Finalized: 146955, Peers: 100 15:28:39.121 INFO - Slot Event *** Slot: 4702641, Block: 725c9e3d1f7ea499cfbe575c0f292a9d2c46cf0d8c84f3e6eda8dcb3b917a928, Justified: 146956, Finalized: 146955, Peers: 100 15:28:51.123 INFO - Slot Event *** Slot: 4702642, Block: ... empty, Justified: 146956, Finalized: 146955, Peers: 100 15:29:03.068 INFO - Slot Event *** Slot: 4702643, Block: 03c1e6a62000c875bb610dd72de9f5192ac7efa1df73fc72b55667414f6adf86, Justified: 146956, Finalized: 146955, Peers: 100 15:29:15.115 INFO - Slot Event *** Slot: 4702644, Block: 47a4440774b1d4d203889bf99ed6466ce4b3221811d99e1cfed9d9c88bb86848, Justified: 146956, Finalized: 146955, Peers: 100 15:29:27.118 INFO - Slot Event *** Slot: 4702645, Block: ... empty, Justified: 146956, Finalized: 146955, Peers: 100 15:29:39.177 INFO - Slot Event *** Slot: 4702646, Block: e329fc7107dd79d9922729ef7f2cd60f811275cbcc08724fb7c9542cf087901b, Justified: 146956, Finalized: 146955, Peers: 100

torfbolt avatar Sep 15 '22 15:09 torfbolt

Having the same problem, also seeing execution client offline when connected to Besu.

patwhite avatar Sep 15 '22 16:09 patwhite

For the record, we're actively investigating this. There are two issues on-going, one is slow responses from Besu which delays importing blocks. The other is slow leveldb performance in Teku which can also delay blocks and can lead to attestations failing to be created as the beacon node falls behind.

ajsutton avatar Sep 15 '22 23:09 ajsutton

Issue is also reported here: https://github.com/hyperledger/besu/issues/4398

ronaldjmaas avatar Sep 16 '22 15:09 ronaldjmaas

Having the same issue with Teku + Geth. Attestation performance has gone from 95%+ pre-merge to around 80% post-merge.

iamkoa avatar Sep 16 '22 17:09 iamkoa

With the latest teku update the frequency has decreased but still happening once every few hours.

totoCZ avatar Sep 17 '22 08:09 totoCZ

With the latest teku update the frequency has decreased but still happening once every few hours.

Are you seeing "Published attestation" events for the missed attestations and are they published on time (not much more than 4s into the slot the attestation is for)? You can use https://slots.symphonious.net to find the start time of slots.

There are a lot of factors involved with getting an attestation included and most of them aren't in your control. The main thing is that teku needs to publish the attestation on time. Teku and the EL (besu in this case) also need to work together to import blocks quickly so that your attestation has the right head block. With 22.9.1-RC1 in all our metrics we've seen Teku processing blocks in a timely fashion but being held up by slow responses from the EL - particularly besu which is still investigating some performance issues.

One thing that you can do is to enable the block time metrics with --Xmetrics-block-timing-tracking-enabled. This should give you a clearer picture of where Teku is spending time when processing blocks. One key information that help us is execution_payload_result_received (it is part of the log entry for block timing) and was introduced in 22.9.1-RC1 (previously time waiting on the EL turned up in transaction_prepared section). That's time waiting for the EL, beyond the time Teku was processing the block in parallel. If the execution_payload_result_received value is high (more than 500ms or so) it is already a clue that the EL is taking too long which will lead to more incorrect head votes, and a higher likelihood that the attestation won't be included even if it is published on time.

ajsutton avatar Sep 17 '22 11:09 ajsutton

Thanks for the update. I will wait for a new besu release and if it's not fixed with that then I'll do more debugging

totoCZ avatar Sep 17 '22 12:09 totoCZ

Still seeing a lot of empty slot entries in the log with 22.9.1

torfbolt avatar Sep 18 '22 08:09 torfbolt

Sep 18 09:44:03 eth-consensus teku[684683]: 2022-09-18 09:44:03.817 WARN  - Late Block Import *** Block: 9ec8c30f1b881288946b1343279f92ece901e54437f63a0f030746a2f57c4574 (4722518) proposer 354772 arrival 1379ms, pre-state_retrieved +27ms, processed +191ms, execution_payload_result_received +3199ms, begin_importing +1ms, transaction_prepared +0ms, transaction_committed +0ms, completed +19ms
Sep 18 09:40:15 eth-consensus teku[684683]: 2022-09-18 09:40:15.465 WARN  - Late Block Import *** Block: c69a596582d72f158ec25e93667e5641faa56e20adb74d359a470c7c5244c909 (4722499) proposer 294130 arrival 1615ms, pre-state_retrieved +29ms, processed +209ms, execution_payload_result_received +2591ms, begin_importing +0ms, transaction_prepared +1ms, transaction_committed +1ms, completed +18ms
Sep 18 09:39:51 eth-consensus teku[684683]: 2022-09-18 09:39:51.315 WARN  - Late Block Import *** Block: c5eb6c61c126031706113a6e2da3db9d0e02051cc6ac75c717ebe1050d88cec6 (4722497) proposer 315427 arrival 2836ms, pre-state_retrieved +22ms, processed +167ms, execution_payload_result_received +1260ms, begin_importing +0ms, transaction_prepared +1ms, transaction_committed +3ms, completed +26ms
Sep 18 09:39:41 eth-consensus teku[684683]: 2022-09-18 09:39:41.311 WARN  - Late Block Import *** Block: 11fefc3bbd1ab5da8986b4a5c2b0a2d7429dfa07a9a0d62d2399fc4af93cd0a4 (4722496) proposer 204277 arrival 1943ms, pre-state_retrieved +28ms, processed +250ms, execution_payload_result_received +4064ms, begin_importing +0ms, transaction_prepared +1ms, transaction_committed +0ms, completed +24ms
Sep 18 09:39:04 eth-consensus teku[684683]: 2022-09-18 09:39:04.049 WARN  - Late Block Import *** Block: 8185c339528b5fe442936ce15cf237bd1a4b970d26a7424a16d2f573b55f021d (4722493) proposer 11215 arrival 1115ms, pre-state_retrieved +31ms, processed +139ms, execution_payload_result_received +3741ms, begin_importing +0ms, transaction_prepared +1ms, transaction_committed +1ms, completed +20ms
Sep 18 09:37:15 eth-consensus teku[684683]: 2022-09-18 09:37:15.796 WARN  - Late Block Import *** Block: 4e53d6c09f8d613463a0c1dcb3aa09a8bcd32d8ec70be9de214cbeafe2a83bcf (4722484) proposer 16907 arrival 1489ms, pre-state_retrieved +34ms, processed +291ms, execution_payload_result_received +2943ms, begin_importing +1ms, transaction_prepared +0ms, transaction_committed +1ms, completed +37ms
Sep 18 09:33:15 eth-consensus teku[684683]: 2022-09-18 09:33:15.947 WARN  - Late Block Import *** Block: 9f8f6cc247e702b1ddff49e871a0f5eb90d0787e7ec3c1337c39387e2e91e628 (4722464) proposer 273783 arrival 2323ms, pre-state_retrieved +22ms, processed +206ms, execution_payload_result_received +2373ms, begin_importing +0ms, transaction_prepared +1ms, transaction_committed +0ms, completed +21ms
Sep 18 09:30:04 eth-consensus teku[684683]: 2022-09-18 09:30:04.153 WARN  - Late Block Import *** Block: de875ee4fb69a5810e8d1f2ef796614ac89c58c57cfdd05c0a6eb86ba4b180da (4722448) proposer 10461 arrival 2474ms, pre-state_retrieved +22ms, processed +212ms, execution_payload_result_received +2416ms, begin_importing +0ms, transaction_prepared +0ms, transaction_committed +1ms, completed +26ms

(Arrival in 2s is normal?)

totoCZ avatar Sep 18 '22 09:09 totoCZ

@torfbolt That can be caused by quite a lot of factors. I'd suggest enabling the late block timing logs as described in https://github.com/ConsenSys/teku/releases/tag/22.9.1 so you can see why the block is late importing.

ajsutton avatar Sep 18 '22 10:09 ajsutton

@totocz A 2 second arrival is relatively slow - ideally it should be under 1s, but it's not generally the end of the world. The main problem with those numbers is execution_payload_result_received which is the time that Teku is waiting on besu to respond and that's what's caused those blocks to complete importing after the 4s mark. Some of them would have imported late even if they'd been received quickly.

To give an idea of block arrival times. In the last day, 11% of blocks were received within 500ms, 50% of blocks were received between 500ms and 1s, 25% between 1s and 1.5s and 8% between 1.5s and 2s. Then about 5% of blocks were received later than that.

So with the time besu is taking most blocks would probably be imported just in time, but quite frequently blocks would be imported late and you'd see an empty slot message when really the block was received with enough time that it should have been imported in time. This is the common pattern we're seeing with 22.9.1 where it's the execution client that is now dominating the block import time and there's not much more Teku can do to improve things.

ajsutton avatar Sep 18 '22 10:09 ajsutton

Besides besu, is there anything that can be done to further speed up the arrivals? This is a powerful datacenter node in the UK with 100 peers. It's not like there should be any limit on my machine. Does it fully depend on p2p network conditions?

totoCZ avatar Sep 18 '22 10:09 totoCZ

The late arrivals are almost always because the block producer (someone else's validator) was slow in producing the block. There's nothing you can do about that.

ajsutton avatar Sep 18 '22 10:09 ajsutton

I also enabled --Xmetrics-block-timing-tracking-enabled and logged the Late Block Import lines for the last 2 hours:

execution_payload_result_received is higher than 500ms: 1073ms 925ms 1616ms 1125ms 1104ms 1195ms 1561ms 1243ms 675ms 450ms 1341ms

and arrival is also higher than 1000ms: 4625ms 3898ms 3898ms 2378ms 6597ms 3307ms 8275ms 3690ms 4132ms 3356ms 3884ms 4222ms

Full Log:

Sep 19 13:47:41 eth-server teku[387521]: 13:47:41.016 WARN  - Late Block Import *** Block: b1c24fabb4765e1abf2b9c89928dc206737583e2bfa8eca1e78000daf5ecba0a (4730336) proposer 134914 arrival 4625ms, pre-state_retrieved +7ms, processed +305ms, execution_payload_result_received +1073ms, begin_importing +0ms, transaction_prepared +0ms, transaction_committed +0ms, completed +6ms
Sep 19 14:40:39 eth-server teku[387521]: 14:40:39.931 WARN  - Late Block Import *** Block: 084df768999642371b46394047fdcc51a5f8380928e8d7df551bfa07be1a102c (4730601) proposer 6347 arrival 3898ms, pre-state_retrieved +7ms, processed +95ms, execution_payload_result_received +925ms, begin_importing +0ms, transaction_prepared +0ms, transaction_committed +0ms, completed +6ms
Sep 19 14:47:15 eth-server teku[387521]: 14:47:15.186 WARN  - Late Block Import *** Block: 965030340b20acc7f326f32a5ab5f42179d1e158a002bfc15dde2ce1e9d6c671 (4730634) proposer 268358 arrival 2378ms, pre-state_retrieved +9ms, processed +172ms, execution_payload_result_received +1616ms, begin_importing +0ms, transaction_prepared +1ms, transaction_committed +0ms, completed +9ms
Sep 19 14:50:42 eth-server teku[387521]: 14:50:42.801 WARN  - Late Block Import *** Block: 702fde89e7a4597e1faad53fce7670c680e5ac333150953d9c20b18bec6fd71b (4730651) proposer 96283 arrival 6597ms, pre-state_retrieved +8ms, processed +63ms, execution_payload_result_received +1125ms, begin_importing +0ms, transaction_prepared +0ms, transaction_committed +0ms, completed +8ms
Sep 19 14:58:03 eth-server teku[387521]: 14:58:03.526 WARN  - Late Block Import *** Block: 37be8af64f2685017e6230ff9d6a1ff67736a745809ee631d0ce341c3c68eb5a (4730688) proposer 332615 arrival 3307ms, pre-state_retrieved +5ms, processed +98ms, execution_payload_result_received +1104ms, begin_importing +0ms, transaction_prepared +0ms, transaction_committed +0ms, completed +12ms
Sep 19 15:09:08 eth-server teku[387521]: 15:09:08.588 WARN  - Late Block Import *** Block: 074e3791a628b74eca1d29073e998a72567781ac78189567708c92c0d0b7e279 (4730743) proposer 266432 arrival 8275ms, pre-state_retrieved +6ms, processed +107ms, execution_payload_result_received +1195ms, begin_importing +0ms, transaction_prepared +1ms, transaction_committed +0ms, completed +4ms
Sep 19 15:17:16 eth-server teku[387521]: 15:17:16.371 WARN  - Late Block Import *** Block: a0904112478161a75cb3b18e24bb7b9cafefb71dd706a30f23f582665dfb6ece (4730784) proposer 411853 arrival 3690ms, pre-state_retrieved +9ms, processed +102ms, execution_payload_result_received +1561ms, begin_importing +0ms, transaction_prepared +0ms, transaction_committed +1ms, completed +8ms
Sep 19 15:19:04 eth-server teku[387521]: 15:19:04.531 WARN  - Late Block Import *** Block: f9c26cb136d3a712583180beed111cf7986be16e87946a611ace0b9bf43efca9 (4730793) proposer 316624 arrival 4132ms, pre-state_retrieved +9ms, processed +137ms, execution_payload_result_received +1243ms, begin_importing +0ms, transaction_prepared +0ms, transaction_committed +0ms, completed +10ms
Sep 19 15:24:27 eth-server teku[387521]: 15:24:27.138 WARN  - Late Block Import *** Block: e09905215980d1d539c49018878efc6b4c57a87005424a08be0cdb9066d55db4 (4730820) proposer 423210 arrival 3356ms, pre-state_retrieved +8ms, processed +91ms, execution_payload_result_received +675ms, begin_importing +0ms, transaction_prepared +1ms, transaction_committed +0ms, completed +7ms
Sep 19 15:27:15 eth-server teku[387521]: 15:27:15.378 WARN  - Late Block Import *** Block: 8adbd498f02afb0fc001c8dd81813206c9d48b95b711a3df90ffb2059841ca2c (4730834) proposer 405111 arrival 3884ms, pre-state_retrieved +5ms, processed +36ms, execution_payload_result_received +450ms, begin_importing +0ms, transaction_prepared +0ms, transaction_committed +0ms, completed +3ms
Sep 19 15:29:52 eth-server teku[387521]: 15:29:52.646 WARN  - Late Block Import *** Block: 6fb6c3eed319d1fa0741de65cc9c182135308cf1073fc94de85fae500b2298cf (4730847) proposer 175686 arrival 4222ms, pre-state_retrieved +7ms, processed +66ms, execution_payload_result_received +1341ms, begin_importing +0ms, transaction_prepared +1ms, transaction_committed +0ms, completed +9ms

I did not have missing attestations during that time, however once in a while the distance in the Attestation performance Info looks as follows: distance 1 / 1.17 / 2

Maybe this information helps with debugging.

scheer88 avatar Sep 19 '22 13:09 scheer88

besu 22.7.3 https://termbin.com/fmng I guess now we need to wait for https://github.com/hyperledger/besu/pull/4409 ? Feel free to close this here if you want, I can move the issue over to besu

totoCZ avatar Sep 24 '22 17:09 totoCZ

Having the same issue with Teku + Geth. Attestation performance has gone from 95%+ pre-merge to around 80% post-merge.

Same here. I hardly ever missed an attestation before the merge. Now missing them every hour almost. image

AXDevTimothy avatar Oct 10 '22 18:10 AXDevTimothy

I miss one as well every 4 hours on average with besu + teku. I also see some attested epoch's that does give a negative award (those are not missed!) image

@totoCZ https://github.com/hyperledger/besu/pull/4409 seems to be merged, so will be in the next besu release is my guess

kiss81 avatar Oct 15 '22 20:10 kiss81

I'm going to close this one as we've ironed out the initial performance issues with Teku a while back and what's left just boils down to besu import times which they're working on.

For anyone who comes across this issue in the future, https://www.symphonious.net/2022/09/25/understanding-attestation-misses/ has a bunch of details for how to investigate why attestations were missed.

ajsutton avatar Nov 02 '22 00:11 ajsutton