Not regaining peers until restart (Version: 2.4)
Issue Description
Multiple reports from validator operators report that they consistently lose peers over time and don't regain them until a restart.
Steps to Reproduce
Run rippled configured as a validator (may not be required to be a validator) and monitor peers from first network sych. Start with default hub config. Restart after a day.
Expected Result
Dropped peers will be regained over time. Restart has no magic affect on peering.
Actual Result
Peers slowly go down over time. The peers will not be regained until service is restarted.
Environment
mainnet production master release of 2.3, 2.4 Could be related to having fixed_ips as many validators do.
Supporting Files
Quick thoughts:
- When a peer is dropped, it's resource charge is still tracked and decreases over time.
- If a peer reconnects before the charge gets below an acceptable value, it will be dropped again, and charged for that drop.
- I think that for a "default" node, there are enough other IPs to choose from that if one drops me, I'll have plenty of others to choose from. Fixed IPs want to reestablish those connections.
- I bet they're not waiting long enough.
This could be supported if affected operators can grep dropped their debug.log.
They'll see messages like
2025-Mar-28 10:39:05.388214684 UTC Resource:WRN Consumer entry A.B.C.D dropped with balance 25005 at or above drop threshold 25000
2025-Mar-28 10:39:05.388275701 UTC Resource:WRN Charging A.B.C.D for dropped ($6000)
It's normal to see three or four sets of these messages within a second for the same peer. However, if there are more than that, and they are separated by even a couple of seconds, that would lend weight to this hypothesis.
Questions:
- Does a
ripplednode inform a peer that it's being dropped for resources? - Does the dropped peer keep track of that?
- Do configured fixed IPs bypass any kind of built-in delay for reconnecting?
rippled server_info -> "peer_disconnects" : "583" and another with "peer_disconnects" : "3107"
cat /etc/opt/ripple/rippled.cfg | grep dropped
return a single result.
*checked x3 node with high disconnect values and all three of them have only this one line in the grep of the logging. Not seeing the expected
2025-Mar-28 10:39:05.388214684 UTC Resource:WRN Consumer entry A.B.C.D dropped with balance 25005 at or above drop threshold 25000 2025-Mar-28 10:39:05.388275701 UTC Resource:WRN Charging A.B.C.D for dropped ($6000)
on my nodes I do spec quite a number of fixed ip's further investigation on this issue in that direction I think would be appropriate.
cat /etc/opt/ripple/rippled.cfg | grep dropped
That's the config file, not the log. 😊
dope lol... fixed command, but still very little across the 3 instances.
dope lol... fixed command, but still very little across the 3 instances.
(When sharing logs and such, please copy and paste them as text. They're much easier for me to read and work with than screenshots.)
This is valuable information. It shows a handful of the disconnect message pairs within about 1/10 of a second, which doesn't show attempts to reconnect. Do me a favor and try
cat /var/log/rippled/debug.log | grep -e dropped -e "Over resource limit" -e "charge: Resources"
and
cat /var/log/rippled/debug.log | grep 147.93.41.127
