rippled icon indicating copy to clipboard operation
rippled copied to clipboard

Use Validator List (VL) cache files in more scenarios

Open ximinez opened this issue 10 months ago • 5 comments

High Level Overview of Change

  • If any [validator_list_keys] are not available after all [validator_list_sites] have had a chance to be queried, then fall back to loading cache files. Currently, cache files are only used if no sites are defined, or the request to one of them has an error. It does not include cases where not enough sites are defined, or if a site returns an invalid VL (or something else entirely).
  • Resolves #5320

Context of Change

Validator list cache files are only used if the request to a site fails. That doesn't cover enough possible cases.

Type of Change

  • [X] Bug fix (non-breaking change which fixes an issue)
  • [X] New feature (non-breaking change which adds functionality)

API Impact

None

Before / After

Makes the following changes:

  1. If a validator site request times out (ValidatorSite::onRequestTimeout), and there is not yet a lastRefreshStatus, it sets one indicating the timeout.
  2. In ValidatorSite::setTimer, which determines which request to send next, if all of the sites have a lastRefreshStatus, calls missingSite.
    • missingSite is unchanged. It calls ValidatorList::loadLists, which returns all the cache file names for lists ([validator_list_keys]) which are unavailable, and for which a cache file exists. Those file names are then passed to ValidatorSite::load, which adds them to the sites_ list. Because those "sites" are new, they will be tried next.
  3. ValidatorSite::load checks for duplicate URIs. Since it's a vector, it's not very efficient, but this function is only called at startup and via missingSites, so it won't be called often, and not in any critical path.

Test Plan

Reproduce the scenario from #5320. Verify that the UNL becomes available within a few seconds of startup.

ximinez avatar Feb 26 '25 22:02 ximinez

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 79.5%. Comparing base (92281a4) to head (f78c633).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##           develop   #5323   +/-   ##
=======================================
  Coverage     79.5%   79.5%           
=======================================
  Files          816     816           
  Lines        72180   72191   +11     
  Branches      8295    8264   -31     
=======================================
+ Hits         57364   57387   +23     
+ Misses       14816   14804   -12     
Files with missing lines Coverage Δ
src/xrpld/app/misc/detail/ValidatorSite.cpp 93.9% <100.0%> (+1.4%) :arrow_up:

... and 7 files with indirect coverage changes

Impacted file tree graph

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Feb 26 '25 22:02 codecov[bot]

I updated my validators.txt to include:

[validator_list_sites]
# https://vl.ripple.com/
https://vl.xrplf.org/
https://vl.xrplf.org/

[validator_list_keys]
# Ripple
ED2677ABFFD1B33AC6FBC3062B71F1E8397C1505E1C42C64D11AD1B28FF73F4734
# XRPLF
ED45D1840EE724BE327ABE9146503D5848EFD5F38B6D5FEDE71E80ACCE5E6E738B

Notice that it includes the keys for both Ripple and XRPLF, but the URL for Ripple is commented out, and the URL for XRPLF is duplicated.

$ rippled -q validator_list_sites
{
   "result" : {
      "status" : "success",
      "validator_sites" : [
         {
            "last_refresh_status" : "same_sequence",
            "last_refresh_time" : "2025-Feb-26 22:40:44.2977547 UTC",
            "next_refresh_time" : "2025-Feb-26 22:45:43.7251469 UTC",
            "refresh_interval_min" : 5,
            "uri" : "https://vl.xrplf.org/"
         },
         {
            "last_refresh_status" : "accepted",
            "last_refresh_time" : "2025-Feb-26 22:25:44.4012575 UTC",
            "next_refresh_time" : "2025-Feb-27 22:25:44.4017257 UTC",
            "refresh_interval_min" : 1440,
            "uri" : "file://[...]/cache.ED2677ABFFD1B33AC6FBC3062B71F1E8397C1505E1C42C64D11AD1B28FF73F4734"
         }
      ]
   }
}

Note that there are two sites listed here: XRPLF, and the local cache.

$ rippled -q validators
{
   "result" : {
      "publisher_lists" : [
         {
            "available" : true,
            "expiration" : "2026-Jan-17 10:09:21.0000000 UTC",
            "list" : [
               [...]
            ],
            "pubkey_publisher" : "ED2677ABFFD1B33AC6FBC3062B71F1E8397C1505E1C42C64D11AD1B28FF73F4734",
            "seq" : 81,
            "uri" : "file://[...]/cache.ED2677ABFFD1B33AC6FBC3062B71F1E8397C1505E1C42C64D11AD1B28FF73F4734",
            "version" : 2
         },
         {
            "available" : true,
            "expiration" : "2026-Jan-18 00:00:00.0000000 UTC",
            "list" : [
               [...]
            ],
            "pubkey_publisher" : "ED45D1840EE724BE327ABE9146503D5848EFD5F38B6D5FEDE71E80ACCE5E6E738B",
            "seq" : 2025011701,
            "uri" : "https://vl.xrplf.org/",
            "version" : 1
         },
[...]

Note that both lists are available, and that both indicate the same uri as shown in validator_list_sites (which does not necessarily need to be the case).

ximinez avatar Feb 26 '25 22:02 ximinez

Notice that it includes the keys for both Ripple and XRPLF, but the URL for Ripple is commented out, and the URL for XRPLF is duplicated.

What is the scenario that is tested here? Ripple's publisher list was cached at a previous point in time. Presently, that publisher list is unavailable (due to being commented out in the config file). Your changes to the code have resorted to using the cache to load the ValidatorList. Is this an accurate description of the PR?

In what other situations are you going to be using the cache?

ckeshava avatar Mar 07 '25 23:03 ckeshava

What is the scenario that is tested here?

It's demonstrating two things:

  1. That the [validator_list_sites] and [validator_list_keys] are independent. This was already true.
  2. That if all the sites included in [validator_list_sites] are queried and any of the [validator_list_keys] still do not have a valid UNL available, the cache file will be loaded. As noted in the PR description, the original behavior will not load the cache in that scenario. "Currently, cache files are only used if no sites are defined, or the request to one of them has an error. It does not include cases where not enough sites are defined, or if a site returns an invalid VL (or something else entirely)."

Ripple's publisher list was cached at a previous point in time. Presently, that publisher list is unavailable (due to being commented out in the config file). Your changes to the code have resorted to using the cache to load the ValidatorList. Is this an accurate description of the PR?

Yep.

In what other situations are you going to be using the cache?

"Currently, cache files are only used if no sites are defined, or the request to one of them has an error."

ximinez avatar Mar 11 '25 22:03 ximinez

Since this PR has been sitting for so long, I'm thinking that it should be expanded to simply always try to load the VL cache files on start up. There are two main reasons.

  1. The node can immediately start listening for trusted validations - it doesn't even have to wait for the web download to finish.
  2. It reduces the risk of processing of old VLs because the node will know about the most recent VL it wrote to a cache.

ximinez avatar Sep 03 '25 17:09 ximinez