sui icon indicating copy to clipboard operation
sui copied to clipboard

one of the nodes falls behind by several hundred blocks

Open ToShared opened this issue 7 months ago • 4 comments

I am using Sui version 1.5, and I have three nodes on the same local network. Occasionally, one of the nodes falls behind by several hundred blocks. There are no communication issues between the nodes, and all servers can communicate with each other without any problems.

some logs:

2025-05-20T06:56:22.890589Z TRACE sui_network::state_sync: get_latest_checkpoint_summary request failed: Status { status: Unknown, headers: {}, message: Some("unknown error: closed by peer: connection closed (code 0)"), peer_id: None, source: Some(closed by peer: connection closed (code 0)

Stack backtrace: 0: anyhow::error::<impl core::convert::From<E> for anyhow::Error>::from 1: <anemo::network::peer::Peer as tower_service::Service<anemo::types::request::Requestbytes::bytes::Bytes>>::call::{{closure}}::{{closure}} 2: <anemo_tower::trace::future::ResponseFuture<Fut,ClassifierT,OnResponseT,OnFailureT> as core::future::future::Future>::poll 3: <anemo::middleware::timeout::outbound::ResponseFuture<F> as core::future::future::Future>::poll 4: sui_network::state_sync::generated::state_sync_client::StateSyncClient<T>::get_checkpoint_summary::{{closure}} 5: sui_network::state_sync::get_latest_from_peer::{{closure}} 6: tokio::runtime::task::core::Core<T,S>::poll 7: tokio::runtime::task::harness::Harness<T,S>::poll 8: tokio::runtime::scheduler::multi_thread::worker::Context::run_task 9: tokio::runtime::scheduler::multi_thread::worker::Context::run 10: tokio::runtime::context::runtime::enter_runtime 11: tokio::runtime::task::core::Core<T,S>::poll 12: tokio::runtime::task::harness::Harness<T,S>::poll 13: tokio::runtime::blocking::pool::Inner::run 14: std::sys::backtrace::__rust_begin_short_backtrace 15: core::ops::function::FnOnce::call_once{{vtable.shim}} 16: std::sys::pal::unix::thread::Thread::new::thread_start 17: 18: ) } 2025-05-20T06:56:22.890687Z TRACE drive{id=6}:send{space=Da

Dustin, [2025/5/20 14:57] 2025-05-20T06:56:20.425664Z TRACE drive{id=5}:recv{space=Data pn=39}: quinn_proto::connection: got frame Close(Application(ApplicationClose { error_code: 0, reason: b"connection closed" })) 2025-05-20T06:56:20.425669Z TRACE drive{id=5}: quinn_proto::connection: connection closed 2025-05-20T06:56:20.425706Z TRACE drive{id=5}:send{space=Data pn=30}: quinn_proto::connection: sending CONNECTION_CLOSE 2025-05-20T06:56:20.425709Z TRACE drive{id=5}:send{space=Data pn=30}: quinn_proto::connection: ACK ArrayRangeSet([36..40]), Delay = 0us 2025-05-20T06:56:20.425715Z TRACE drive{id=5}: quinn_proto::connection: sending 38 bytes in 1 datagrams 2025-05-20T06:56:20.425728Z TRACE anemo::network::request_handler: error listening for incoming uni streams: closed by peer: connection closed (code 0) 2025-05-20T06:56:20.425731Z TRACE anemo::connection: Closing Connection 2025-05-20T06:56:20.425735Z DEBUG anemo::network::request_handler: InboundRequestHandler ended peer=******* (i delete it)

ToShared avatar May 21 '25 07:05 ToShared

Thank you for opening this issue, a team member will review it shortly. Until then, please do not interact with any users that claim to be from Sui support and do not click on any links!

github-actions[bot] avatar May 21 '25 07:05 github-actions[bot]

Due to certain reasons, we cannot directly upgrade the version.

ToShared avatar May 21 '25 07:05 ToShared

I'm surprised sui-node version 1.5 can still sync data, or did you build sui-node from main? What is the values of last_executed_checkpoint on all the nodes?

Also, which addresses are in the seed peer list?

mwtian avatar May 22 '25 16:05 mwtian

I'm surprised sui-node version 1.5 can still sync data, or did you build sui-node from main? What is the values of last_executed_checkpoint on all the nodes?

Also, which addresses are in the seed peer list?

We modified part of the source code to remove transaction fees, which was done a long time ago. Partial configuration is as follows:

db-path: /data/sui/node-data/full_node_db/xxx network-address: /ip4/127.0.0.1/tcp/40033/http json-rpc-address: "0.0.0.0:9000" metrics-address: "0.0.0.0:32931" admin-interface-port: 39423 enable-event-processing: true enable-index-processing: true grpc-load-shed: ~ grpc-concurrency-limit: ~ p2p-config: listen-address: "0.0.0.0:42141" external-address: /ip4/10.xx.x.xx/udp/42141 seed-peers: - peer-id: xxx address: /ip4/10.xx.x.xx/udp/10002 - peer-id: xxx address: /ip4/10.xx.x.xx/udp/10002 - peer-id: xxx address: /ip4/10.xx.x.xx/udp/10002 - peer-id: xxx address: /ip4/10.xx.x.xx/udp/10002

ToShared avatar May 26 '25 06:05 ToShared

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Jul 26 '25 02:07 github-actions[bot]