rippled icon indicating copy to clipboard operation
rippled copied to clipboard

SNTP Clock issues

Open HowardHinnant opened this issue 2 years ago • 6 comments

Issue Description

The purpose of this issue is to provide a place to discuss a couple of issues surrounding the time keeping used to create ledger close times.

Currently rippled communicates with an NTP server to get the correct time, and this time serves as the basis for determining ledger close times. Which NTP server is used is configurable in rippled.cfg under the [sntp-servers] tag where one or more servers can be listed, and one will be chosen for the job. It might be possible, however remotely, for a malicious actor to intercept communications between a node in the XRP network and its NTP server, and shift the network time of the node away from its peers.

  1. Should a node use only one NTP server? We could, for example, query three or more servers, eliminate any reporting a time sufficiently far from the mean, and average the rest for the current time.
  2. Should we be using NTP servers at all? We could just query the OS for the current time (e.g. std::chrono::system_clock::now()) and leave it up to server operators to keep their machines temporarily synced. This would allow us to remove the code we currently have that reaches out to NTP servers, and remove the associated config tag.
  3. Is the status quo the best option and thus we should leave well enough alone?

HowardHinnant avatar Jun 14 '22 20:06 HowardHinnant

Maybe we could check (on linux/macos) whether ntpd is running, and if it is, just use the system's time (assuming ntpd will do a better job than we could avoiding drift). If ntpd is not running, or on windows, revert to querying NTP servers ourselves.

greg7mdp avatar Jun 15 '22 06:06 greg7mdp

Doesn't Windows also normally have an NTP server it is running?

HowardHinnant avatar Jun 15 '22 13:06 HowardHinnant

Maybe we could check (on linux/macos) whether ntpd is running, and if it is, just use the system's time (assuming ntpd will do a better job than we could avoiding drift). If ntpd is not running, or on windows, revert to querying NTP servers ourselves.

I think that if the decision is to rely on an external process (like chrony, ntpd or even sntpd) then we explicitly document a dependency on having time synchronization software running and then rely on admins ensuring that the environment inside which the daemon runs is properly configured.

I wouldn't have code that's mostly dead but spring to life in misconfigured systems; that's generally a recipe for disaster.

Considering that rippled refuses to establish peer connections if the clock drift between the server and its peer exceeds 20 seconds, I think that hosts without synchronized clocks that are wildly off would generally find themselves unable to connect (a message is logged) and would clue the admin in.

Long story short, I'm fine with removing this. Hey @JoelKatz, what's your take? We've been using this code since... must be at least 1998? Are you OK if it goes the way of the dodo and rippled requires the underlying system to have a synchronized clock?

nbougalis avatar Jun 18 '22 04:06 nbougalis

I have no objection to removing it.

JoelKatz avatar Jun 18 '22 04:06 JoelKatz

I also would prefer removing the NTP stuff from the code base. It is a nice safety net in some edge cases, but it masks operator errors and could lead to unexpected situations. It also seems to have very little impact on the protocol etc. anyways.

MarkusTeufelberger avatar Jun 21 '22 09:06 MarkusTeufelberger

Here is a patch that removes the SNTP code for review: https://github.com/HowardHinnant/rippled/commit/6c239bde8d82214286d2f538092eabb1beffafc3

HowardHinnant avatar Aug 18 '22 12:08 HowardHinnant