celestia-node icon indicating copy to clipboard operation
celestia-node copied to clipboard

Bug: celestia node bridge failed to start

Open Chainode opened this issue 5 months ago • 1 comments

Celestia Node version

v0.23.1-mocha

OS

Ubuntu 22.04

Install tools

No response

Others

No response

Steps to reproduce it

Following scenario:

  • The celestia archive node (celestia-app) runs on a separate sever than the celestia bridge node;
  • Both binaries have been built from source;
  • The celestia-app has been upgraded first to v4.0.2-mocha. Necessary config regarding grpc_laddr has been added as well;
  • The celestia bridge node has been upgraded to v0.23.1-mocha and the config has been reinitialised as suggested;

After restart the celestia bridge node threw out this error: `Level 4 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB Level 5 [B]: NumTables: 28. Size: 84 MiB of 87 MiB. Score: 0.00->0.00 StaleData: 27 MiB Target FileSize: 4.0 MiB Level 6 [ ]: NumTables: 149. Size: 868 MiB of 868 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 8.0 MiB Level Done

Error: node: failed to start: failed to get node info: rpc error: code = Unknown desc = post failed: Post "http://localhost:26657": dial tcp 127.0.0.1:26657: connect: connection refused ` All configs and flags have been verified and there was no trace of using 127.0.0.1:26657 for anything related to the celestia bridge node.

Expected result

The bridge node should start without any issues after respecting the release notes and binary version. Relevant changes with endpoints, ports and flags should be made availabe and aware of.

Actual result

Workaround: After digging dipper, one idea was to change the exposed rpc behind firewall in the celestia app archive node from the public IP back to 127.0.0.1. This made the celestia node bridge start and function properly again.

Proposed solution: Now it seems there is some logic and internal calls happening that use by default the rpc with the endpoint 127.0.0.1:26557. It should be possible to change this behaviour/endpoint with flags or config parameters and make this information available in the release notes. Otherwise more complex setups will break if endpoints, ports are changed and default ones aren't used.

Relevant log output

2025-06-04T23:47:48.247Z        INFO    badger4 [email protected]/db.go:616
Level 0 [ ]: NumTables: 00. Size: 0 B of 0 B. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 16 MiB
Level 1 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 2 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 3 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 4 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 5 [B]: NumTables: 28. Size: 84 MiB of 87 MiB. Score: 0.00->0.00 StaleData: 27 MiB Target FileSize: 4.0 MiB
Level 6 [ ]: NumTables: 149. Size: 868 MiB of 868 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 8.0 MiB
Level Done

Error: node: failed to start: failed to get node info: rpc error: code = Unknown desc = post failed: Post "http://localhost:26657": dial tcp 127.0.0.1:26657: connect: connection refused

Is the node "stuck"? Has it stopped syncing?

No response

Notes

No response

Chainode avatar Jun 05 '25 10:06 Chainode

We had same error, and it got fix by switching back RPC port On archive node (RPC consensus on mocha-4) back to default 26657 (we were using another port because we have multiple tendermint chains on this server) laddr = "tcp://0.0.0.0:26757" It feels like there is some kind of mechanism where GRPC requests need to query RPC, and it's hardcoded to default 26657. We could not find a flag to configure this to another port on our RPC node

rares-e2s avatar Jun 05 '25 10:06 rares-e2s