nethermind icon indicating copy to clipboard operation
nethermind copied to clipboard

[Enhancement] [1.10.27] JsonRPC, Metrics & HealthChecks Not Running While in DbLoad

Open Texnomic opened this issue 3 years ago • 12 comments

I'm Running the Mainnet Archive Config and the Node is in DbLoad Sync Mode: Syncing previously downloaded blocks from DB (partial offline mode until it finishes).

All of EndPoints are down: JsonRPC, Metrics & HealthChecks.

Note: Node still processing previously downloaded blocks.

Config File:

{
  "Init": {
    "WebSocketsEnabled": true,
    "StoreReceipts": true,
    "IsMining": false,
    "ChainSpecPath": "chainspec/foundation.json",
    "GenesisHash": "0xd4e56740f876aef8c010b86a40d5f56745a118d0906a34e69aec8c0db1cb8fa3",
    "BaseDbPath": "nethermind_db/mainnet_archive",
    "LogFileName": "mainnet_archive.logs.txt",
    "MemoryHint": 10240000000
  },
  "Network": {
    "DiscoveryPort": 30303,
    "P2PPort": 30303,
    "ActivePeersMaxCount": 200
  },
  "JsonRpc": {
    "Enabled": true,
    "Timeout": 20000,
    "Host": "127.0.0.1",
    "Port": 8545
  },
  "TxPool": {
    "Size": 2048
  },
  "Db": {
    "CacheIndexAndFilterBlocks": true
  },
  "Sync": {
    "DownloadBodiesInFastSync": false,
    "DownloadReceiptsInFastSync": false,
    "UseGethLimitsInFastBlocks": true
  },
  "EthStats": {
    "Enabled": false,
    "Server": "wss://ethstats.net/api",
    "Name": "Nethermind",
    "Secret": "secret",
    "Contact": "[email protected]"
  },
  "Metrics": {
    "NodeName": "Nethermind",
    "Enabled": true,
    "PushGatewayUrl": "http://10.0.0.33:9091/metrics",
    "IntervalSeconds": 5
  },
  "HealthChecks": {
    "Enabled": true,
    "WebhooksEnabled": false,
    "WebhooksUri": "https://slack.webhook",
    "UIEnabled": true,
    "PollingInterval": 5,
    "Slug": "/api/health",
    "MaxIntervalWithoutProcessedBlock ": 15,
    "MaxIntervalWithoutProducedBlock": 45
  }
}

Texnomic avatar Mar 08 '21 13:03 Texnomic

Hi @Texnomic - this was an early design of stopping all calls for the time when we sync from DB. Syncing from the DB happens when you stop the node during archive sync while the blocks have already been downloaded from the network. Since these blocks can be then processed entirely offline, we shut down networking while doing it to speed up the sync.

We have seen in the past that it was an undesired behaviour for some users so if you could express your opinion here we would appreciate as we may potentially change this behaviour.

tkstanczak avatar Mar 08 '21 13:03 tkstanczak

@tkstanczak I can understand the design decision, but at least Health Checks & Monitoring should be enabled. Otherwise the node is completely silent :)

Texnomic avatar Mar 08 '21 13:03 Texnomic

@Texnomic can you please have a look to this #3680 whether it solves the issue?

dB2510 avatar Dec 11 '21 15:12 dB2510

@tkstanczak @LukaszRozmej @dB2510 hello there!

I have the same issue on the Nethermind client version 1.13.3. It's fixed or not?

Why I'm asking: when I ran Nethermind client in Archive mode first time JsonRpc initialized correctly, but when I'm restarting for some reason client it's not initialized, but in logs I see that node is syncing.

Configuration:

{
  "Init": {
    "DiscoveryEnabled": true,
    "WebSocketsEnabled": true,
    "StoreReceipts" : true,
    "ChainSpecPath": "chainspec/fuse.json",
    "BaseDbPath": "nethermind_db/fuse_archive",
    "LogFileName": "fuse_archive.logs.txt",
    "StaticNodesPath": "Data/static-nodes-fuse.json"
  },
  "Network": {
    "DiscoveryPort": 30303,
    "P2PPort": 30303,
    "LocalIp": "0.0.0.0",
    "ExternalIp": "0.0.0.0"
  },
  "JsonRpc": {
        "Enabled": true,
        "Timeout": 20000,
        "Host": "0.0.0.0",
        "Port": 8545,
        "WebSocketsPort": 8546
   },
  "Metrics": {
    "NodeName": "Fuse_archive"
  },
  "Bloom": {
    "IndexLevelBucketSizes": [
      16,
      16,
      16
    ]
  },
  "Pruning": {
    "Mode": "None"
  },
  "Mining": {
    "MinGasPrice": "10000000000"
  }
}

Thank you!

AliakseiMalyshau avatar Sep 23 '22 09:09 AliakseiMalyshau

It's a severe issue. I can't believe it was done intentionally! I spent half of the night trying to understand what was wrong with my node and configs.

People use monitoring software to check sync status and it can shut down its RPC endpoint for days or weeks. It's literally unacceptable behavior for most applications and huge cons against using this software.

begetan avatar Oct 05 '22 06:10 begetan

I spent more than a week to find this link and RPC not listening while synching old blocks . This needs to be addressed to avoid any confusion

crypto0243 avatar Oct 11 '22 04:10 crypto0243

I ran into the same problem, is there a way to know how many blocks the node needs to sync before going back online with RPC etc. enabled ?

MaxTeiger avatar Oct 20 '22 09:10 MaxTeiger

I want to do some admin rpc while syncing and it's also impossible

Jack-Works avatar May 01 '23 00:05 Jack-Works

It would also be very helpful if the logs said something more informative rather than Syncing previously downloaded blocks from DB (partial offline mode until it finishes). Like where are we with this, will it take half an hour or weeks?

bazzilic avatar Jan 22 '24 05:01 bazzilic

@LukaszRozmej I think now it does not impact that much the performance, right? At least on archive it will not be that visible as archive is not the fastest in current design so maybe we could just enable it back and would be good?

@Demuirgos you were enabling JsonRPc earlier in node startup so maybe you would like to pick this one as well?

kamilchodola avatar Jun 03 '24 21:06 kamilchodola

I remember this being changed back and forth, can you check current state?

LukaszRozmej avatar Jun 04 '24 13:06 LukaszRozmej

Will do it as follows: Start NodeA which will have processing disabled and will download like 5 mln blocks Start NodeB and download just a little bit of blocks (like 500k) Start NodeC and let it process just minimum number of blocks (to make sure it started processing not stuck on BeaconHeaders)

Then stop all and restart and will see how fast those will reach like 2 milion of blocks.

not sure if there will be better testcase for that.

kamilchodola avatar Jun 04 '24 14:06 kamilchodola