rippled icon indicating copy to clipboard operation
rippled copied to clipboard

Looking-glass support. Add `monitor` to Connect-As in peer protocol.

Open RichardAH opened this issue 3 years ago • 0 comments

Summary

As the network grows and public usage of the XRPL continues to increase, the ability to monitor and debug the peer behaviour of individual nodes becomes increasingly important. To this end I have created a looking glass tool (WIP: https://github.com/RichardAH/xrpl-peermon) which can connect to and report the packets sent by a particular peer. Recently this has been used to partially debug the halt of Nov 3 2021.

Motivation

The mesh network is not as robust as previously believed and while it remains a blackbox the issues with it are unlikely to be resolved.

Solution

To better support looking glass tools in future I propose the Connect-As header in the peer connection upgrade request be extended to include monitor in addition to peer with the following key differences:

  1. A monitor is a mute peer not expected to have any ledgers, deltas, shards, transactions, manifests, validations or state data but should still receive packets that would be sent to it if it were a well connected peer.
  2. A monitor may receive additional packets not normally in the peer protocol as per any extended packet types created for this purpose. These might include information about jobs, memory, counters and so on. To ensure this additional data isn't abused a monitor_reservations_add RPC call is recommended. This reservation only restricts the output of these novel packets, not the general capacity to connect as a monitor (which all nodes should support if their peer port is open.)

Paths Not Taken

Additional logging by rippled would not solve this issue because:

  1. Any such logs are only present on each individual node and therefore are not available to provide realtime information to network operators about the actual behaviour of foreign nodes with whom they are peering.
  2. It requires self reporting by the same code which is possibly the cause of misbehaviour.

RichardAH avatar Nov 11 '21 13:11 RichardAH