rippled icon indicating copy to clipboard operation
rippled copied to clipboard

"server_state" is full, but "complete_ledgers" is empty (Version: 2.1.0-rc1)

Open sgramkumar opened this issue 1 year ago • 4 comments

Issue Description

On a host with no existing db, installing rippled and waiting for the db to sync up, results in "server_state" full, but "complete_ledgers" empty.

Steps to Reproduce

  1. Install rippled 2.1.0-rc1
  2. Bring up rippled to join devnet
  3. Wait for db to sync up
  • seq matching with public devnet dns
  • server_state=full/proposing

Expected Result

As in https://xrpl.org/connect-your-rippled-to-the-xrp-test-net.html#5-verify-that-your-server-syncs, wait for "seq" from local host to match with public network dns. Ideally the db should be synced now. Additionallywaiting for "server_state" to be "full" confirms the db is synced

Actual Result

After "seq" matched with public dns, and "server_state" is full, the issue is, "complete_ledgers" is "empty"

Wait for rippled db to sync up (attempt: 1/2)...
  [Time elapsed: 0 seconds] Sequence:  2 (local) vs  4077157 (devnet - 35.85.8.19)
  [Time elapsed: 20 seconds] Sequence:  4 (local) vs  4077164 (devnet - 35.85.20.122)
  [Time elapsed: 40 seconds] Sequence:  11 (local) vs  4077171 (devnet - 35.85.8.19)
  [Time elapsed: 60 seconds] Sequence:  11 (local) vs  4077177 (devnet - 35.85.20.122)
  [Time elapsed: 80 seconds] Sequence:  11 (local) vs  4077184 (devnet - 35.85.8.19)
  [Time elapsed: 100 seconds] Sequence:  13 (local) vs  4077191 (devnet - 35.85.20.122)
  [Time elapsed: 120 seconds] Sequence:  17 (local) vs  4077197 (devnet - 35.85.8.19)
  [Time elapsed: 140 seconds] Sequence:  17 (local) vs  4077204 (devnet - 35.85.20.122)
  [Time elapsed: 160 seconds] Sequence:  17 (local) vs  4077211 (devnet - 35.85.8.19)
  [Time elapsed: 180 seconds] Sequence:  24 (local) vs  4077217 (devnet - 35.85.20.122)
  [Time elapsed: 200 seconds] Sequence:  24 (local) vs  4077224 (devnet - 35.85.8.19)
  [Time elapsed: 220 seconds] Sequence:  24 (local) vs  4077231 (devnet - 35.85.20.122)
  [Time elapsed: 240 seconds] Sequence:  26 (local) vs  4077237 (devnet - 35.85.8.19)
  [Time elapsed: 260 seconds] Sequence:  31 (local) vs  4077244 (devnet - 35.85.20.122)
  [Time elapsed: 280 seconds] Sequence:  31 (local) vs  4077251 (devnet - 35.85.8.19)
  [Time elapsed: 300 seconds] Sequence:  31 (local) vs  4077258 (devnet - 35.85.20.122)
  [Time elapsed: 320 seconds] Sequence:  36 (local) vs  4077264 (devnet - 35.85.8.19)
  [Time elapsed: 340 seconds] Sequence:  38 (local) vs  4077271 (devnet - 35.85.20.122)
  [Time elapsed: 360 seconds] Sequence:  38 (local) vs  4077278 (devnet - 35.85.8.19)
  [Time elapsed: 380 seconds] Sequence:  38 (local) vs  4077284 (devnet - 35.85.20.122)
  [Time elapsed: 400 seconds] Sequence:  44 (local) vs  4077291 (devnet - 35.85.8.19)
  [Time elapsed: 420 seconds] Sequence:  44 (local) vs  4077297 (devnet - 35.85.20.122)
  [Time elapsed: 440 seconds] Sequence:  44 (local) vs  4077304 (devnet - 35.85.8.19)
  [Time elapsed: 460 seconds] Sequence:  49 (local) vs  4077311 (devnet - 35.85.20.122)
  [Time elapsed: 480 seconds] Sequence:  49 (local) vs  4077317 (devnet - 35.85.8.19)
  [Time elapsed: 500 seconds] Sequence:  49 (local) vs  4077324 (devnet - 35.85.20.122)
  [Time elapsed: 520 seconds] Sequence:  52 (local) vs  4077331 (devnet - 35.85.8.19)
  [Time elapsed: 540 seconds] Sequence:  54 (local) vs  4077337 (devnet - 35.85.20.122)
  [Time elapsed: 560 seconds] Sequence:  54 (local) vs  4077344 (devnet - 35.85.8.19)
  [Time elapsed: 580 seconds] Sequence:  55 (local) vs  4077351 (devnet - 35.85.20.122)
  [Time elapsed: 600 seconds] Sequence:  59 (local) vs  4077357 (devnet - 35.85.8.19)
  [Time elapsed: 620 seconds] Sequence:  59 (local) vs  4077364 (devnet - 35.85.20.122)
  [Time elapsed: 640 seconds] Sequence:  59 (local) vs  4077371 (devnet - 35.85.8.19)
  [Time elapsed: 660 seconds] Sequence:  4077377 (local) vs  4077377 (devnet - 35.85.20.122)

**** server_state: "full", complete_ledgers: "empty",

Environment

devnet

Supporting Files

rippled.log: https://gist.github.com/sgramkumar/ad5e49ad9ffd3648799aa3f7155bb45f

server_info output

Loading: "/opt/ripple/etc/rippled.cfg"
2024-Feb-09 08:27:07.771282325 UTC HTTPClient:NFO Connecting to 127.0.0.1:51234

{
   "result" : {
      "info" : {
         "build_version" : "2.1.0-rc1",
         "complete_ledgers" : "empty",
         "hostid" : "runner-6s2nbvqf-project-1346-concurrent-0",
         "initial_sync_duration_us" : "665368053",
         "io_latency_ms" : 1,
         "jq_trans_overflow" : "0",
         "last_close" : {
            "converge_time_s" : 2,
            "proposers" : 10
         },
         "load" : {
            "job_types" : [
               {
                  "in_progress" : 1,
                  "job_type" : "clientRPC"
               },
               {
                  "job_type" : "untrustedValidation",
                  "per_second" : 3
               },
               {
                  "in_progress" : 1,
                  "job_type" : "updatePaths"
               },
               {
                  "avg_time" : 1,
                  "job_type" : "ledgerData",
                  "peak_time" : 26,
                  "per_second" : 29
               },
               {
                  "job_type" : "advanceLedger",
                  "peak_time" : 2,
                  "per_second" : 3
               },
               {
                  "job_type" : "trustedValidation",
                  "per_second" : 2
               },
               {
                  "job_type" : "trustedProposal",
                  "per_second" : 1
               },
               {
                  "job_type" : "peerCommand",
                  "per_second" : 37
               },
               {
                  "job_type" : "SyncReadNode",
                  "per_second" : 28351
               },
               {
                  "job_type" : "AsyncReadNode",
                  "peak_time" : 5,
                  "per_second" : 13467
               },
               {
                  "job_type" : "WriteNode",
                  "peak_time" : 1,
                  "per_second" : 6125
               }
            ],
            "threads" : 6
         },
         "load_factor" : 1,
         "network_id" : 2,
         "node_size" : "medium",
         "peer_disconnects" : "19",
         "peer_disconnects_resources" : "0",
         "peers" : 1,
         "ports" : [
            {
               "port" : "51234",
               "protocol" : [ "http" ]
            },
            {
               "port" : "51235",
               "protocol" : [ "peer" ]
            },
            {
               "port" : "6006",
               "protocol" : [ "ws" ]
            }
         ],
         "pubkey_node" : "n9JZ4YtPxFfD9o3VsrqZrXaKyBCEB7p1f3Hafh6WzBKhCtAND7cM",
         "pubkey_validator" : "none",
         "server_state" : "full",
         "server_state_duration_us" : "1024720",
         "state_accounting" : {
            "connected" : {
               "duration_us" : "174130926",
               "transitions" : "15"
            },
            "disconnected" : {
               "duration_us" : "490236184",
               "transitions" : "15"
            },
            "full" : {
               "duration_us" : "1024720",
               "transitions" : "1"
            },
            "syncing" : {
               "duration_us" : "1000922",
               "transitions" : "1"
            },
            "tracking" : {
               "duration_us" : "5",
               "transitions" : "1"
            }
         },
         "time" : "2024-Feb-09 08:27:07.771664 UTC",
         "uptime" : 666,
         "validated_ledger" : {
            "age" : 3,
            "base_fee_xrp" : 1e-05,
            "hash" : "331471338D1CE8D3FA81836C609C005952529DE1A70B02CD47E5D43F6BB791D4",
            "reserve_base_xrp" : 10,
            "reserve_inc_xrp" : 2,
            "seq" : 4077377
         },
         "validation_quorum" : 8,
         "validator_list" : {
            "count" : 1,
            "expiration" : "2031-Aug-07 19:36:22.000000000 UTC",
            "status" : "active"
         }
      },
      "status" : "success"
   }
}

sgramkumar avatar Feb 09 '24 17:02 sgramkumar

Setup pipeline to capture the transition from "empty" to "non-empty" when server_state is full

Example 1:

..
  [Time elapsed: 160 seconds] Sequence:  4494078 (local) vs  4494078 (devnet - 35.85.20.122)
**** Sat Feb 24 03:05:12 UTC 2024 (1708743912.531117) server_state: "full", complete_ledgers: "empty",
**** Wait for 'complete_ledgers' to be non-empty
..
..
**** Sat Feb 24 03:06:28 UTC 2024 (1708743988.929838) server_state: "full", complete_ledgers: "4494056-4494103",

Time taken for the transition: 3:05:12 to 3:06:28. That is 1 minutes 16 seconds

Example 2:

  [Time elapsed: 240 seconds] Sequence:  4517165 (local) vs  4517165 (devnet - 35.85.20.122)
**** Sat Feb 24 22:49:55 UTC 2024 (1708814995.562676) server_state: "full", complete_ledgers: "empty",
**** Wait for 'complete_ledgers' to be non-empty
..
**** Sat Feb 24 22:50:50 UTC 2024 (1708815050.555848) server_state: "full", complete_ledgers: "4517153-4517155",

Time taken for the transition: 22:49:55 to 22:50:50. That is 55 seconds

sgramkumar avatar Feb 27 '24 21:02 sgramkumar

Thank you for finding that it can take a significant amount of time (about a minute) for a synced server to get complete ledgers.

When using rippled, clients should generally wait for complete_ledgers to be non-empty before proceeding with RPC calls/queries. This should be clarified in the docs.

Alternatively, we could make a change to the definition of server_state: full. The potential change would be to not report full until complete_ledgers is non-empty.

intelliot avatar Feb 29 '24 18:02 intelliot

I may have a little insight that I can contribute here. I'm going to guess that the syncing @sgramkumar is doing is without a pre-existing database. If that's incorrect, then the following conjecture is off base.

I very seldom run rippled without a pre-existing database. But just today rippled told me that my database was corrupt, so I wiped my database. The startup behavior I saw, without a pre-existing database, was surprising to me. The server reported that it was proposing very quickly, like under a minute. But it wasn't stable. Before it got around to having a "published_ledger" it had 270 transitions out of "full". I never see anything like that when I have a pre-existing database. If I have a pre-existing database I'll have at most 2 or 3 transitions out of "full".

I suspect this is relatively new behavior? Like, new this year? I have started rippled with a fresh database in the past and it takes a while to sync, but it didn't jump to "full" until it was actually ready.

What's relevant for this issue is that we may be looking at behavior that is new and only occurs when starting without a database.

scottschurr avatar Mar 01 '24 23:03 scottschurr