bor icon indicating copy to clipboard operation
bor copied to clipboard

Bor sync stuck at block 0x312d050

Open eldimious opened this issue 2 years ago • 54 comments

System information

Bor client version: 1.2.1

Heimdall client version: 1.0.3

OS & Version: Linux

Environment: Polygon Mainnet

Type of node: Full

Overview of the problem

I am running a full node using bor and heimdall via docker the last 2 months but seems that the bor sync stucks 11h ago at block 0x312d050. I am getting following logs from bor docker image:

bor                  | WARN [12-26|16:31:24.814] unable to handle whitelist milestone     err="missing blocks"
bor                  | INFO [12-26|16:31:36.814] Got new milestone from heimdall          start=51,584,847 end=51,584,869 hash=0x112ae9614d96a0db2fb572d324f1ca505983ef0b309b1c0970f698994964bb89
bor                  | WARN [12-26|16:31:36.815] unable to handle whitelist milestone     err="missing blocks"
bor                  | INFO [12-26|16:31:40.819] Got new checkpoint from heimdall         start=51,583,142 end=51,583,653 rootHash=0xbaa9de2414f3853a1be0556bd33ca614024e6a8b864940a482e2c84fa1527bf1
bor                  | WARN [12-26|16:31:40.819] Failed to whitelist checkpoint           err="missing blocks"
bor                  | WARN [12-26|16:31:40.819] unable to handle whitelist checkpoint    err="missing blocks"
bor                  | INFO [12-26|16:31:48.813] Got new milestone from heimdall          start=51,584,847 end=51,584,869 hash=0x112ae9614d96a0db2fb572d324f1ca505983ef0b309b1c0970f698994964bb89
bor                  | WARN [12-26|16:31:48.813] unable to handle whitelist milestone     err="missing blocks"
bor                  | INFO [12-26|16:32:00.814] Got new milestone from heimdall          start=51,584,847 end=51,584,869 hash=0x112ae9614d96a0db2fb572d324f1ca505983ef0b309b1c0970f698994964bb89
bor                  | WARN [12-26|16:32:00.815] unable to handle whitelist milestone     err="missing blocks"
bor                  | INFO [12-26|16:32:12.814] Got new milestone from heimdall          start=51,584,847 end=51,584,869 hash=0x112ae9614d96a0db2fb572d324f1ca505983ef0b309b1c0970f698994964bb89
bor                  | WARN [12-26|16:32:12.814] unable to handle whitelist milestone     err="missing blocks"
bor                  | INFO [12-26|16:32:24.814] Got new milestone from heimdall          start=51,584,870 end=51,584,892 hash=0xdef7276b17971f87470ffa0c516ec2a1de75fd12564106af2771f084d7bc63e8
bor                  | WARN [12-26|16:32:24.814] unable to handle whitelist milestone     err="missing blocks"
bor                  | INFO [12-26|16:32:36.815] Got new milestone from heimdall          start=51,584,870 end=51,584,892 hash=0xdef7276b17971f87470ffa0c516ec2a1de75fd12564106af2771f084d7bc63e8
bor                  | WARN [12-26|16:32:36.815] unable to handle whitelist milestone     err="missing blocks"
bor                  | WARN [12-26|16:32:44.111] Snapshot extension registration failed   peer=5f67ba47 err="peer connected on snap without compatible eth support"
bor                  | INFO [12-26|16:32:48.815] Got new milestone from heimdall          start=51,584,870 end=51,584,892 hash=0xdef7276b17971f87470ffa0c516ec2a1de75fd12564106af2771f084d7bc63e8
bor                  | WARN [12-26|16:32:48.815] unable to handle whitelist milestone     err="missing blocks"
bor                  | INFO [12-26|16:33:00.814] Got new milestone from heimdall          start=51,584,893 end=51,584,911 hash=0x72137465e871305c04cae0d017d60848a17b4f70caa302f7d3b8e55615a8ac54
bor                  | WARN [12-26|16:33:00.814] unable to handle whitelist milestone     err="missing blocks"

Any idea how can i fix it? I tried to restart docker image but the error remains.

eldimious avatar Dec 26 '23 16:12 eldimious

same

psmahlii avatar Dec 27 '23 08:12 psmahlii

I tried also to debug.setHead to change head to some blocks (1000+ behind) and for some hours started sync again, but then stopped again.

Any idea how to fix it?

eldimious avatar Dec 27 '23 08:12 eldimious

Same issue. Same version for bor and heimdall. Although stuck at a different height(51640301), constantly getting WARN log with Dec 28 14:30:33 203078 bor[1828860]: WARN [12-28|14:30:33.977] unable to handle whitelist milestone err="missing blocks"

GeassV avatar Dec 28 '23 06:12 GeassV

Can you check your peers using ipc and admin.peers command ?

0xKrishna avatar Dec 28 '23 11:12 0xKrishna

@0xKrishna

enode://11e0cbb03a834019b0222f54bccf32512bef4294dd722642684762d1d01c84031c1075767195d9968dcdb9e38326f08b14547d8e33b0b67a0ef1aa0b045845d0@35.171.120.130:30303?discport=30315,
enode://b0f026f7ccfd5c1450e933572ae44b262a7d084647a30d0a8d9e2c8cab8d5b1c7721f3c60bfcd50c0fede114c7e2d316649389ba2449ca85d1ddd9e2947f1c28@147.135.100.106:30303?discport=30334,
enode://2d4bd1fa38182fa868a583fc946c8d5e4043b013381cf20927c16cf8f17b4f3e793c5e9f34fc785c52d887aab07181bdb0ebae50d9e3f05e5c14aed19f81929a@65.108.127.87:30303?discport=30340,
enode://ab879b4eaacf495ec760f2806e78509da80e327ba4262d8153698f88b0a95287a692bbaf3a3cece9ad27f889246c04e2b5ca8e75bf083acbb4806eb669cc3a77@35.171.120.130:30303?discport=30334,
enode://1a69f7dae12959a358b92a395ec79de2ab4601a59a5b0b951d4e6247da2101d7d6d77a919086251e70b552a49ae74d630e19233306a189a1b627c2115ecf3cfa@34.203.27.246:30303?discport=30320,
enode://574a9195f40a7c4bd68536167ef53a7385bab8934dfc8db94d013b1a73af76eb73f148536cb8b8365e8240728f6e80af0ddb4ead3a2544de907cce561839ce61@51.81.217.117:30303?discport=30323,
enode://142cce22e125325f4895b2268e32185f5dbe90f9c818ab135f16c7face23a55b46d0b78a0286595a262d4fa58ff314e7e2553e13f528a3c3e9616184b77f5b85@65.108.127.87:30303?discport=30323,
enode://50c8f9d2849a209383edd15dfd67ba0a8d3f5e9853fd1af9c1678f4aef2dc5e3817c34ddce9390d5e8dd4891ad7f66003a3bea5af9e288df6f26ed070d9bd741@54.38.217.112:30303?discport=30335,
enode://72be2da5ba01bc2f3a7764bf1d4f18550a36df629820ea0f6d37fe1cd1355d0f1c201b2a5f382e794ee56e0f5befa504e85e96548a45a0fba44bb6bd1075e28e@54.38.155.225:30303?discport=30306,
enode://53b53f55f2a1674873f8f58ee23616db8384f278a1206cf79c8c18d4ebc32b4424128229de2ea999803c08c9262974f1fb1f2b0d87ca6ec40aea1594c0ba0ef7@65.108.1.189:30303?discport=30337,
enode://eb0ee5596ea6df526eb7e0ace41f015bcb9ee4f27996c72ea15d1cd28ec69f89b6e64247696c0150111b52ca58810f5d0f42d59ac38fdb26ba7323bcc835475a@51.81.196.100:30303?discport=30313,
enode://c4a2a7c422ddce70a39164ce53762262bd5dc8917f5613b1c92c94affb36516e63f88721763a1dcfed5f36403e0fc21894e34c2981f2f6f1f100b9f186a986a1@51.38.72.15:30303?discport=30307,
enode://2197472b27c39587e2ae2c199e91527a25d25b2c1217f14c8d8b342068209a889913c7c1eb6f60044a0d28bd59ccec157d18ebb7918293e8878d11185831cf22@54.38.75.21:30303?discport=30320,
enode://b6d9bef47ce86b94331cdcfd2a1a91f28ab48db171aa70659973b3869988e7e4806fd24406c6f57187664643dffc0edf74e7a16ac315ca7933589357ec875550@51.38.72.15:30303?discport=30311,
enode://4585b746a2ae2f74575313199bd35159e8b679608fa1bd4e3a2823c0c24f8e49f9cb1e0c312de30a8b08c16a6666101897ffff47a6c162dca6ddb87c206c4cd2@66.70.233.151:30303?discport=30313,
enode://c8ab3d6ec8d7c1c7df462f55f02acaced2949ec4542475fa25ebb104feaa78a196f0e39cfc2bf1236ead1c647b734726cb9f4f03eb933c94f318cca160e5ce16@54.38.217.112:30303?discport=30334

eldimious avatar Dec 28 '23 20:12 eldimious

Can you check your peers using ipc and admin.peers command ?

Sure. I rolled back to bor v1.1.0 from v1.2.1 because some issues said that rollingback might be a solution. and then found an interesting performance every time I restart bor service, it syncs for a while and then stuck with the above "whitelist milestone" log.

> admin.peers [{ caps: ["eth/66", "eth/67", "eth/68", "snap/1"], enode: "enode://6de3bbba54699dcc11b982c7970fdd938946d3638bab27d9006698b998447cf838891a310c61d6c74d042091366ba07690ef1f09a026fab28a31a06cd387b67b@13.57.125.97:30321", enr: "enr:-KO4QIY12LW3IWDW2JzqMdtg9Pyv7PEASdnlLFAEzUEuzOgVEvW5hWe2EB_Jd6iqKnRHi_SyP1INx6iDk3a6CMoyqOqGAYvR7YGvg2V0aMfGhNwIhlyAgmlkgnY0gmlwhA05fWGJc2VjcDI1NmsxoQNt47u6VGmdzBG5gseXD92TiUbTY4urJ9kAZpi5mER8-IRzbmFwwIN0Y3CCdnGDdWRwgnZx", id: "136d74cf29e85b49f991b1d97b5800f1a45968b0542642c47c970c1502762313", name: "bor/v1.1.0/linux-amd64/go1.20.10", network: { inbound: false, localAddress: "172.18.35.78:37836", remoteAddress: "13.57.125.97:30321", static: false, trusted: false }, protocols: { eth: { version: 68 }, snap: { version: 1 } } }, { caps: ["eth/66", "eth/67", "eth/68", "snap/1"], enode: "enode://b8187a46754cdf631d67b89e3e73d5e061ab2ce5a62cc8a79cfd754b04dc5394b381f1d99d59a8b6baeb68b4c019512b59dcbdc0cb682320f96508331cf8e8f3@54.38.217.112:30303?discport=30324", id: "1c405a70749de50ea441c6c59c07e7d4dde5e18f47102a20b88db98cddcbb6a2", name: "bor/v1.0.6/linux-amd64/go1.20.8", network: { inbound: false, localAddress: "172.18.35.78:51320", remoteAddress: "54.38.217.112:30303", static: false, trusted: false }, protocols: { eth: { version: 68 }, snap: { version: 1 } } }, { caps: ["eth/66", "eth/67", "eth/68", "snap/1"], enode: "enode://256fe3efb2f83e4821f4d028273757e525da48bb69a3da5c4230a410d5b96e948a79ae42e60a4914092249ee3bb928756534c67b6c3003f0d08a180373735edc@65.108.1.189:30303?discport=30395", enr: "enr:-KO4QHQlnI0aegmfJbdsiPIskZywzNjBmulaKf9scy3wuCR_XirUnjEjwSsDfjJe40LWodLNpjLDW48N4MtdFEXOXh6GAYx2yUm_g2V0aMfGhNwIhlyAgmlkgnY0gmlwhEFsAb2Jc2VjcDI1NmsxoQIlb-Pvsvg-SCH00CgnN1flJdpIu2mj2lxCMKQQ1blulIRzbmFwwIN0Y3CCdl-DdWRwgna7", id: "3e8f038a2af1414377f24cacf7e6591b4007c60b8de292b7bec24d7a27cd9c49", name: "bor/v1.0.6/linux-amd64/go1.20.8", network: { inbound: false, localAddress: "172.18.35.78:52390", remoteAddress: "65.108.1.189:30303", static: false, trusted: false }, protocols: { eth: { version: 68 }, snap: { version: 1 } } }, { caps: ["eth/66", "eth/67", "eth/68", "snap/1"], enode: "enode://2cd2be98b78f486171994f32ca995f4d53a783172f360a9224181c3cb1b487bd88e95658cb05405642ee2455fc31ae0919f8b2699cc02ed9ed2aef09b9fc93c2@54.38.216.84:30303?discport=30331", enr: "enr:-KO4QN1KbAC8kuy161pxm8kHqtI8VMjk9cQjVFJT4s6TH3G-LJK4QAdY7LqugQ8Yt8-hYUzFDrqoaMFR3xQVhQHoH46GAYyGmlAzg2V0aMfGhNwIhlyAgmlkgnY0gmlwhDYm2FSJc2VjcDI1NmsxoQIs0r6Yt49IYXGZTzLKmV9NU6eDFy82CpIkGBw8sbSHvYRzbmFwwIN0Y3CCdl-DdWRwgnZ7", id: "496c218828d2d1864a9e228e7ad33a481ae60acb81becfb2e565053f4e1f1a5c", name: "bor/v1.0.6/linux-amd64/go1.20.8", network: { inbound: false, localAddress: "172.18.35.78:47924", remoteAddress: "54.38.216.84:30303", static: false, trusted: false }, protocols: { eth: { version: 68 }, snap: { version: 1 } } }, { caps: ["eth/66", "eth/67", "eth/68", "snap/1"], enode: "enode://994252f3fbe56302ba967cab1f01fada30ef8fdb335e6f974a55dd258c2052d1c8c7f181c147d3958ca7e5c7aec76f4f316f50891b137dcbcfd811e453f9d8cc@135.125.214.37:30303?discport=30340", id: "6bcba20976d073441dfdda8631ddf8fc0db9056e00485e8fe49717dac36560df", name: "bor/v1.0.6/linux-amd64/go1.20.8", network: { inbound: false, localAddress: "172.18.35.78:44738", remoteAddress: "135.125.214.37:30303", static: false, trusted: false }, protocols: { eth: { version: 68 }, snap: { version: 1 } } }, { caps: ["eth/66", "eth/67", "eth/68", "snap/1"], enode: "enode://29e354ff99595687d321d44b72c0e458f481046edd8d18fc5db69df0d61a44068ce9c715d74651d7c635688962f54251af861b13e5b31b4da54bb2c9f05ac794@5.9.87.183:30303?discport=30495", enr: "enr:-KO4QKwM2X_BENPlgEwVZ9SQjAMLtFF1dbJe9lmJ7eW42ai2R7ZAQ6Gc4Xzy2_BJOXsA8sESHmXeLvCGIINbAqjPxDWGAYyF1OC3g2V0aMfGhNwIhlyAgmlkgnY0gmlwhAUJV7eJc2VjcDI1NmsxoQIp41T_mVlWh9Mh1EtywORY9IEEbt2NGPxdtp3w1hpEBoRzbmFwwIN0Y3CCdl-DdWRwgncf", id: "6f1be92e4e8cb5f36e2d2e988d60d492a5992524258fab93ae146a335a8f690a", name: "bor/v1.0.6/linux-amd64/go1.20.8", network: { inbound: false, localAddress: "172.18.35.78:54748", remoteAddress: "5.9.87.183:30303", static: false, trusted: false }, protocols: { eth: { version: 68 }, snap: { version: 1 } } }, { caps: ["eth/66", "eth/67", "eth/68", "snap/1"], enode: "enode://b9e2f920d31ea6cde2ad56fcd1904455d911ccf58201551c22d41c28f5a1b1d20a67c8db30893651d8a47bfe21a95705505c079892290a8cfad06f1b8c425628@44.221.198.244:30303?discport=30316", id: "7752490f98a21bde471c9151b7bfe28347cf83a0813a9fe6e66320ae63152f5b", name: "bor/v1.0.6/linux-amd64/go1.20.8", network: { inbound: false, localAddress: "172.18.35.78:41940", remoteAddress: "44.221.198.244:30303", static: false, trusted: false }, protocols: { eth: { version: 68 }, snap: { version: 1 } } }, { caps: ["eth/66", "eth/67", "eth/68", "snap/1"], enode: "enode://9f1443433c1b1b79ccc2d95f314c4e0823d0b549d1db43e5e0a2fe3a87fdaeb2d693fa4a8e75fd6a77c2917598d91782fb75b8fc6357c4f13073653894418acb@66.70.207.63:30303?discport=30309", id: "8df6a54d5bc8fcac07f8ece1d738414190fc9fe3400776abb33471b9ead46344", name: "bor/v1.0.6/linux-amd64/go1.20.8", network: { inbound: false, localAddress: "172.18.35.78:53838", remoteAddress: "66.70.207.63:30303", static: false, trusted: false }, protocols: { eth: { version: 68 }, snap: { version: 1 } } }, { caps: ["eth/66", "eth/67", "eth/68", "snap/1"], enode: "enode://6668bb0a2ede7963ebc196f5e2c8e4daf480a1b7510b74ad18491d733ccf32ab754b44422e4d40fb88c996a3d33fa08dc96461d77693c4a7976cadef4340ca71@148.113.163.85:30303?discport=30309", id: "8e60fc39583410b077016422c96f36ecc60f077a4910a8848917dd1e5856c4e4", name: "bor/v1.0.6/linux-amd64/go1.20.8", network: { inbound: false, localAddress: "172.18.35.78:36704", remoteAddress: "148.113.163.85:30303", static: false, trusted: false }, protocols: { eth: { version: 68 }, snap: { version: 1 } } }, { caps: ["eth/66", "eth/67", "eth/68", "snap/1"], enode: "enode://298ba98e471a44af8638c297d4f25060119817d20cd49870717cfef0f92d3d3d1e3039b1b5fcd34ef66e5ef97efefb9d38e68eed20d1eec5929dfc422a3731e9@3.219.138.93:30306", id: "90871a5e7b702d78f49f829b75d44728628d6a0448d2e128dee96d3e8a39383e", name: "bor/v1.1.0/linux-amd64/go1.20.10", network: { inbound: false, localAddress: "172.18.35.78:39982", remoteAddress: "3.219.138.93:30306", static: false, trusted: false }, protocols: { eth: { version: 68 }, snap: { version: 1 } } }, { caps: ["eth/66", "eth/67", "eth/68", "snap/1"], enode: "enode://66153dd3af7f793158934d9bd121f68e1e8c5a4c15d3316f2e222e6743f8a46fb02a3b6e70181521c0f82584ebd8b690fcf7c3056d5b78293f1bbe065f038ed9@54.235.96.140:30306", id: "93c951775b564631f98affc9e4539b91daa825e350de64a3a0b760a65d0a7826", name: "bor/v1.1.0/linux-amd64/go1.20.10", network: { inbound: false, localAddress: "172.18.35.78:37624", remoteAddress: "54.235.96.140:30306", static: false, trusted: false }, protocols: { eth: { version: 68 }, snap: { version: 1 } } }, { caps: ["eth/66", "eth/67", "eth/68", "snap/1"], enode: "enode://697850d0a936d1d63d047ce480e6f39f429f2c33cfeec335526fb1e97aa0a11a43065bad4b0e8223ca053f91307a0a672d79586c4efdb81f531122116e6d132f@15.204.47.194:30303?discport=30340", id: "96b764ec1ca7771bdb60b464e498824b22dfc7c7cd8d8a3c28cb9ce4241d72dc", name: "bor/v1.0.6/linux-amd64/go1.20.8", network: { inbound: false, localAddress: "172.18.35.78:33882", remoteAddress: "15.204.47.194:30303", static: false, trusted: false }, protocols: { eth: { version: 68 }, snap: { version: 1 } } }, { caps: ["eth/66", "eth/67", "eth/68", "snap/1"], enode: "enode://a34a45e54b28eef5cc58e66a932471ffa3d914af052346b423117972aa957d0816f79492e657ccf1f356713f5959274d5f39573acde4d64e00a656ae999f0a30@65.108.127.87:30303?discport=30376", id: "9ede61e13d949a6ff325274262cf677d16093daf8be60c441707c8ba047526d3", name: "bor/v1.0.6/linux-amd64/go1.20.8", network: { inbound: false, localAddress: "172.18.35.78:47926", remoteAddress: "65.108.127.87:30303", static: false, trusted: false }, protocols: { eth: { version: 68 }, snap: { version: 1 } } }, { caps: ["eth/66", "eth/67", "eth/68", "snap/1"], enode: "enode://af51799ca42c94ff9db93aa933dad4d7ae5979153658df2a38f90c38654391f8a929c8d6af7cb04ea151f009a2b163d6458a71662d512adf1d300ea49107738f@5.9.87.183:30303?discport=30432", id: "a51dc5db9ffc3dbd5b5c67ed1925a486788b5e7668ca0c624b31468b4090f000", name: "bor/v1.0.6/linux-amd64/go1.20.8", network: { inbound: false, localAddress: "172.18.35.78:40188", remoteAddress: "5.9.87.183:30303", static: false, trusted: false }, protocols: { eth: { version: 68 }, snap: { version: 1 } } }, { caps: ["eth/66", "eth/67", "eth/68"], enode: "enode://76d2d6284ee5637113e3669e0fdff0fca83535e39ee0752b9338d9e306aad3f9b4db4c8e4e8738ad718c0f442daf96a37fc864d73954f931dd3c2b3d85663766@3.239.87.70:56304", id: "c0506599f03d41572ecbc8ea45b6eee0192c622eccd7d614d3bb9a3fb19e2548", name: "Geth/v1.1.8/linux-amd64/go1.20", network: { inbound: true, localAddress: "172.18.35.78:30303", remoteAddress: "3.239.87.70:56304", static: false, trusted: false }, protocols: { eth: { version: 68 } } }, { caps: ["eth/66", "eth/67", "eth/68"], enode: "enode://e6ddc59f7f585019b428a3a076a55a2ef1401926434f798b9fb29abb5502a6b33698bfba0420642132a959051f5e417af9abf6d67dc87d8e6f8e88acdbe1532b@54.90.91.58:34482", id: "d85b17d766b71531af5a5a57065ad2baef16f75df801e34ac3e446c9ea02470d", name: "Geth/v1.1.8/linux-amd64/go1.20", network: { inbound: true, localAddress: "172.18.35.78:30303", remoteAddress: "54.90.91.58:34482", static: false, trusted: false }, protocols: { eth: { version: 68 } } }]

GeassV avatar Dec 30 '23 04:12 GeassV

Any idea how can we solve the issue?

eldimious avatar Jan 01 '24 09:01 eldimious

Ι tried to apply https://forum.polygon.technology/t/recommended-peer-settings-for-mainnet-nodes/13018 [p2p.discovery] i will let you know if this resolves the issue

eldimious avatar Jan 01 '24 11:01 eldimious

Above suggestions are not fixing the issue. Any other suggestion?

eldimious avatar Jan 04 '24 14:01 eldimious

Above suggestions are not fixing the issue. Any other suggestion?

no luck. Tried a new physical machine with bor 1.1.0 and Heimdall 1.0.3 with snapshot data. All over again. Stuck randomly. The original one with weeks of manual restarts, finally went well for half month, not sure why, and afraid of unexpected stuck someday

GeassV avatar Jan 11 '24 06:01 GeassV

@0xKrishna I think I might have hit the same problem on two nodes. The first node stop importing blocks ~8d the other around 2 hours ago.

Node Stopped 2 hours ago (Stopped 2024-01-16 @ 18:30:00 EST)

I have the pprof Goroutine dump for it, see pprof.geth.goroutine.polygon-mainnet-0.pb.gz. It seems to be blocked at https://github.com/maticnetwork/bor/blob/master/core/blockchain.go#L1888.

Node Stopped 8 days ago (Stopped 2024-01-09 @ 12:00:00 EST)

I have a pprof too, see pprof.geth.goroutine.polygon-mainnet-1.pb.gz. On this one I don't clearly see what is blocked. I don't even seems to see the blockchain import goroutine there, so not sure what it was doing.

For this dump, I have a bor attach of admin.nodeInfo and admin.peers, see pprof-polygon-mainnet-1-attach-nodeIndo-peer.txt.

Let me know if you need more info, I'll more closely follow the nodes to see if they get stuck again so I could gather extra data points.

Extra Details

I tried to stopped this node cleanly, sending a single SIGINT signal, then waited for 4 hours to stop cleanly but it never happened. I decided to force killing which means in this state, this stuck node never completed the clean shutdown sequence.

maoueh avatar Jan 17 '24 01:01 maoueh

Same issue on two independent nodes, random block stuck with ERROR: heimdalld[14653]: ERROR[2024-01-20|20:45:38.152] Span proposed is not in-turn module=bor currentChildBlock=52556670 msgStartblock=52563456 msgEndBlock=52569855

VSGic avatar Jan 20 '24 21:01 VSGic

Hey @eldimious @VSGic @maoueh @GeassV ,

  • We can ignore unable to handle whitelist milestone logs. We are working on suppressing these logs to DEBUG.
  • I can see your network is peered.
  • Please downgrade your bor node to v1.1.0 and heimdall to v1.0.3.
  • Try to restart the clients.
  • If the issue persists. Please attach a log dump ( or copy last 200 lines of log ) and configuration used to start the nodes.

Thank you! 💜

0xsharma avatar Jan 25 '24 09:01 0xsharma

Hey @eldimious @VSGic @maoueh @GeassV ,

  • We can ignore unable to handle whitelist milestone logs. We are working on suppressing these logs to DEBUG.
  • I can see your network is peered.
  • Please downgrade your bor node to v1.1.0 and heimdall to v1.0.3.
  • Try to restart the clients.
  • If the issue persists. Please attach a log dump ( or copy last 200 lines of log ) and configuration used to start the nodes.

Thank you! 💜

well, stuck at 52755409 and then moved to 52756404 and stuck again when trying to dump the log and config files bor version 1.1.0 and heimdall v1.0.3 attached are the log and config: output_24_1_26.log bor_config.txt

GeassV avatar Jan 26 '24 03:01 GeassV

Hey @eldimious @VSGic @maoueh @GeassV ,

  • We can ignore unable to handle whitelist milestone logs. We are working on suppressing these logs to DEBUG.
  • I can see your network is peered.
  • Please downgrade your bor node to v1.1.0 and heimdall to v1.0.3.
  • Try to restart the clients.
  • If the issue persists. Please attach a log dump ( or copy last 200 lines of log ) and configuration used to start the nodes.

Thank you! 💜

Hello, the same problem after downgrade. Regular restart needed attached log and config config_bor.txt out_bor.log

VSGic avatar Jan 29 '24 09:01 VSGic

Hello,

I have the same issue. The bor node is stuck at block number 52962568.

bor v1.1.0 
heimdall  v1.0.3

I tried to restart the bor node, but it took a long time to try to stop.

Finally, it was killed by systemd for 'stop-sigterm' timed out.

After starting, the block number rolls back to 52921882, it far away from the stuck block number 52962568.

RyanWang0811 avatar Jan 31 '24 14:01 RyanWang0811

Same here. image

inapeace0 avatar Feb 01 '24 01:02 inapeace0

@CaCaBlocker You can ignore these logs for now as your node is not completely synced.

VAIBHAVJINDAL3012 avatar Feb 01 '24 08:02 VAIBHAVJINDAL3012

@RyanWang0811 Is it working now?

VAIBHAVJINDAL3012 avatar Feb 01 '24 08:02 VAIBHAVJINDAL3012

It is working now. thx.

RyanWang0811 avatar Feb 01 '24 16:02 RyanWang0811

Hi, still have this problem, I restart bor 3-5 times per day

VSGic avatar Feb 05 '24 08:02 VSGic

Still have this problem, too.

This issue is like what I posted previously and the issue seems not to have been repaired or still has any issue. https://github.com/maticnetwork/bor/issues/939

Is it a node bug? or any issue on the chain?

RyanWang0811 avatar Feb 10 '24 23:02 RyanWang0811

Problem still actual, two nodes with different bor versions struggle

VSGic avatar Feb 19 '24 17:02 VSGic

Hey @RyanWang0811 @VSGic what specific errors are you facing currently ? Can you share some logs ? Also have you upgraded to bor v1.2.3 ?

Raneet10 avatar Feb 20 '24 10:02 Raneet10

Hello @Raneet10 I have posted logs above here. I have two nodes, one with bor v1.2.3 , and it have the same problem

VSGic avatar Feb 21 '24 09:02 VSGic

I encountered this problem using the latest version on the testnet, and there is no solution yet。heimdall:v1.0.4-beta,bor:v1.2.6-beta

Excalibur-1 avatar Feb 22 '24 06:02 Excalibur-1

Hello !

Just wanted to mention that we are experiencing the same issues with our 2 polygon bor nodes. I have setup a liveness probe (k8s) to restart the node if it get stuck for more than 15 min. It kinda work but it’s really annoying and we still manage to have small interruptions when both nodes get stuck at the same moment. It happens multiple times per day. It’s really bad.

Anything planned to fix those issues ?

By the way I compared the errors I got in Heimdall and bor logs while it was stuck on a block to the logs I had on the other node that was working. And I found exactly the same error in both. So the issue for sure is not being logged...

bgiegel avatar Feb 23 '24 10:02 bgiegel

Hello, still have this trouble, we cannot send transactions with such node. They are get lost, when node out of sync. We work with polygon in manual regime

VSGic avatar Feb 29 '24 08:02 VSGic

Hi, still actual, and become worse, one node even cannot get synced after reboot and stucks on the way again

VSGic avatar Mar 08 '24 01:03 VSGic

Also faced this issue when bootstraping node from official snapshot. Seems that removing nodekey file fixed that problem and sync is now progressing.

VladStarr avatar Mar 09 '24 18:03 VladStarr