systemd icon indicating copy to clipboard operation
systemd copied to clipboard

NTS support for systemd-timesyncd

Open squell opened this issue 3 months ago • 6 comments

This is intended to eventually close #9481

Some comments:

  • To keep the modification well-separated, the additions to timesyncd are split up two clearly distinguishable parts:

    • The NTS logic itself; these were developed first in the repo https://github.com/squell/nts
    • Modifications to integrate those into the timesyncd sources.
  • Currently, to select an "NTS server" an NTS= line needs to be added to the configuration; because the intent is "secure time", enabling it will ignore all other NTP servers ~~(except for the FallbackNTP)~~. Maybe in the future (if NTS gains traction) it might be wise to configure this differently (e.g. RequireSureTime=yes or something).

  • ~~The function manager_nts_obtain_agreement currently does busy-waiting (which has the advantage of making it perhaps a bit easier to follow the logic). This still should be integrated in the async event loop! (hence the "draft" status of this PR).~~

  • ~~The current meson build system modifications take some pains to make the selection of TLS and crypto backends flexible. I've learned from @poettering that the intent is to consolidate on OpenSSL; so that part can be simplified. Do note that the minimum required OpenSSL version is v3.5 due to a bug in OpenSSL (and a workaround for that bug not being universally compatible with existing NTS servers). In any case, this PR still allows selecting gnutls for TLS and nettle and gcrypt as alternative crypto backends. Not included in this PR (but available if desired) is a crypto backend that relies on libaes_siv, which would allow using older OpenSSL versions.~~

  • I'm not 100% sure I didn't miss some config file for integrating the tests and fuzzing code, if there's anything missing, a PR on our forked repo or Suggestion is appreciated!

  • To get in the weeds on NTS protocol details:

    • This patch set supports plenty of AEAD choices. This PR will signal a preference for AES-GCM-SIV crypto (because it is most space-efficient). This will only work with up-to-date Chrony implementations (4.6.1 or later, for annoying reasons). In my practical tests I haven't found any NTS servers running an older Chrony version, and I don't think it's very useful to make this fresh NTS implementation "bug-compatible" with older Chrony versions. By the time NTS lands in timesyncd I would hope that this issue is long forgotten. But if this is seen as problematic, it's very easy to make the default AES-SIV-CMAC.
    • In case there are missing cookies, the code will only ever replenish these "one cookie at a time". I've discovered (and been advised by @davidv1992) that many routers don't like large NTP packets, so this is the more robust option.
    • In NTS it's possible to request additional cookies by adding a placeholder field in the encrypted part of the NTP request and/or in the unencrypted part; right now this code will only do it in the unencrypted part. Some practical testing is required but I believe not all NTS servers out there like it when the "authenticated and encrypted extension field" contains a "Cookie Placeholder" (or in fact anything else) and the current set-up is the most compatible choice.

squell avatar Sep 18 '25 13:09 squell

hmm, what's the license of the code you are copying in in the first commit? we really want spdx headers.

poettering avatar Sep 18 '25 20:09 poettering

i did some initial review now. please have a closer look at the coding style docs.

poettering avatar Sep 18 '25 20:09 poettering

Thanks for the quick initial review!

Note that the tcp_connect will eventually be completely removed as it's only part of the blocking handshake, which as mentioned needs to be rewritten to fit inside the event system.

hmm, what's the license of the code you are copying in in the first commit? we really want spdx headers.

They're LGPL2.1+ licensed (since they're specifically targetted for timesyncd).

squell avatar Sep 19 '25 09:09 squell

For my own reference, I'm tracking more general issues that still need attention on the forked repo: https://github.com/pendulum-project/nts-timesyncd/issues

squell avatar Sep 19 '25 12:09 squell

Last pushes addresses the comments and removes some of the stub elements (the non-async parts and the superfluous socket handling code).

The business logic of server selection still needs to be checked; right now it just allows testing in the happy scenario. With ordinary NTP there's only one server so the client can decide to retry it or give up, but with NTS the scenarios are different:

  • the NTS key exchange server is failing (we can retry or skip it, like a NTP server)
  • the NTS key exchange succeeded but the communicated secured NTP server is failing
    • this happens during normal operation (the NTS server can decide that all cookies need to be refreshed); in that case the NTS key exchange simply needs to be redone
    • it could also be a DoS scenario (someone is blocking the UDP packets from the NTP server but the NTS key exchange is working fine), in which case the entire NTS server should be skipped

squell avatar Nov 18 '25 09:11 squell

Unit tests are fine, but a large feature like this needs a full system integration tests added to TEST-45-TIMEDATE.sh as well

Is this allowed to use external services? A relatively easy "happy path" and "graceful failure" test would be to try a well-known NTS server and an invalid one (in various orders). A more involved approach would be to write a mock NTS server.

As an aside, are the fuzz tests in the right spot? One doc mentions it should be where they are now, but another says all fuzzers would need to be in src/fuzz/. I have the feeling I have missed something here. Due to the non-trivial parsing aspects of NTS records and NTP extension fields compared to plain NTP they're quite essential.

squell avatar Nov 28 '25 09:11 squell

Is this allowed to use external services? A relatively easy "happy path" and "graceful failure" test would be to try a well-known NTS server and an invalid one (in various orders). A more involved approach would be to write a mock NTS server.

No, as that would likely introduce flakiness, there's no telling where the tests will be running in and if it even has access to the internet. Check how other tests are implemented, like the resolved ones, where a local server is spawned in the VM under test.

bluca avatar Dec 18 '25 00:12 bluca