rippled icon indicating copy to clipboard operation
rippled copied to clipboard

Not regaining peers until restart (Version: 2.4)

Open brettmollin opened this issue 9 months ago • 6 comments

Issue Description

Multiple reports from validator operators report that they consistently lose peers over time and don't regain them until a restart.

Steps to Reproduce

Run rippled configured as a validator (may not be required to be a validator) and monitor peers from first network sych. Start with default hub config. Restart after a day.

Expected Result

Dropped peers will be regained over time. Restart has no magic affect on peering.

Actual Result

Peers slowly go down over time. The peers will not be regained until service is restarted.

Environment

mainnet production master release of 2.3, 2.4 Could be related to having fixed_ips as many validators do.

Supporting Files

brettmollin avatar Mar 28 '25 15:03 brettmollin

Quick thoughts:

  1. When a peer is dropped, it's resource charge is still tracked and decreases over time.
  2. If a peer reconnects before the charge gets below an acceptable value, it will be dropped again, and charged for that drop.
  3. I think that for a "default" node, there are enough other IPs to choose from that if one drops me, I'll have plenty of others to choose from. Fixed IPs want to reestablish those connections.
  4. I bet they're not waiting long enough.

This could be supported if affected operators can grep dropped their debug.log. They'll see messages like

2025-Mar-28 10:39:05.388214684 UTC Resource:WRN Consumer entry A.B.C.D dropped with balance 25005 at or above drop threshold 25000
2025-Mar-28 10:39:05.388275701 UTC Resource:WRN Charging A.B.C.D for dropped ($6000)

It's normal to see three or four sets of these messages within a second for the same peer. However, if there are more than that, and they are separated by even a couple of seconds, that would lend weight to this hypothesis.

Questions:

  1. Does a rippled node inform a peer that it's being dropped for resources?
  2. Does the dropped peer keep track of that?
  3. Do configured fixed IPs bypass any kind of built-in delay for reconnecting?

ximinez avatar Mar 31 '25 18:03 ximinez

rippled server_info -> "peer_disconnects" : "583" and another with "peer_disconnects" : "3107"

cat /etc/opt/ripple/rippled.cfg | grep dropped

return a single result.

Image

*checked x3 node with high disconnect values and all three of them have only this one line in the grep of the logging. Not seeing the expected

2025-Mar-28 10:39:05.388214684 UTC Resource:WRN Consumer entry A.B.C.D dropped with balance 25005 at or above drop threshold 25000 2025-Mar-28 10:39:05.388275701 UTC Resource:WRN Charging A.B.C.D for dropped ($6000)

on my nodes I do spec quite a number of fixed ip's further investigation on this issue in that direction I think would be appropriate.

shortthefomo avatar Apr 06 '25 18:04 shortthefomo

cat /etc/opt/ripple/rippled.cfg | grep dropped

That's the config file, not the log. 😊

ximinez avatar Apr 07 '25 19:04 ximinez

dope lol... fixed command, but still very little across the 3 instances.

Image Image Image

shortthefomo avatar Apr 08 '25 01:04 shortthefomo

dope lol... fixed command, but still very little across the 3 instances. Image

(When sharing logs and such, please copy and paste them as text. They're much easier for me to read and work with than screenshots.)

This is valuable information. It shows a handful of the disconnect message pairs within about 1/10 of a second, which doesn't show attempts to reconnect. Do me a favor and try

cat /var/log/rippled/debug.log | grep -e dropped -e "Over resource limit" -e "charge: Resources"

and

cat /var/log/rippled/debug.log | grep 147.93.41.127

ximinez avatar Apr 10 '25 19:04 ximinez