avahi icon indicating copy to clipboard operation
avahi copied to clipboard

Use monotonic clock when possible

Open JPEWdev opened this issue 8 years ago • 28 comments

Changes avahi to use a monotonic clock (clock_gettime() + CLOCK_MONOTONIC) for timers on systems that support it instead of gettimeofday(). This prevents changes to the system clock from disrupting the avahi timers. In particular, if the system clock is set backward by a significant amount, it can prevent avahi from sending any queries or responses because the old entries in the respective history lists won't time out.

JPEWdev avatar Jan 13 '17 19:01 JPEWdev

I believe this changes the API of libavahi-common, and would require bumping the version number of that library to 9.0.0?

JPEWdev avatar Jan 13 '17 19:01 JPEWdev

Hello Avahi team,

Any chances of this commit making into master branch?

Making Avahi independent of system time will help resolve few of the issues which i am facing.

Thanks Jitendra

jitendrajagasia avatar Jul 11 '17 03:07 jitendrajagasia

Absolutely, I am keep to merge this patch and it's next up on my list of things to merge. I apologise for the delay but I am eagerly trying to allocate more time to the project to ensure things are happening on a better time frame. Thanks!

lathiat avatar Jul 13 '17 04:07 lathiat

I think this is ready to be merged. I had pushed some other commits to my branch, but I removed them and will open a separate pull request for them.

JPEWdev avatar May 24 '19 13:05 JPEWdev

Thanks for the PR. My main concern is if we will run into problems with various implementations of the Poll API comparing to gettimeofday() instead. Did you put any thought or testing into that?

lathiat avatar Jun 14 '19 06:06 lathiat

@lathiat We are also being hit by this problem and verified the problem is solved by this PR. The test your are looking for is one which demonstrates the problem being fixed, or a test that it does not break anything else?

ptsneves avatar Dec 05 '19 11:12 ptsneves

This patch now runs on our embedded devices for multiple years. I couldn't report any problem with it. Hope it makes it to 0.9/ would be good if merged. Thanks everyone.

mbodmer avatar Aug 26 '21 07:08 mbodmer

Reviving this PR as we also had another report of it breaking avahi-autoipd in #359

Does anyone know if this could break any users of avahi-common/AvahiPoll? I'm thinking if they supply us values from gettimeofday or compare something we supply to them they might break. Though I guess at least by changing the type those people should get a compile time failure. I'm thinking probably only poll implementations would be the problematic things? Most of which are in-tree.

lathiat avatar Sep 17 '21 12:09 lathiat

Seems pretty much only the AvahiPoll APIs we have to worry about. Some poll implementations like avahi-glib (updated in the PR) mostly don't matter as we end up passing avahi-glib a differential time we calculate anyway. However others (like libevent) seem like they'll need to switch to using the monotonic version of their relevant timer API as we actually pass the gettimeofday value. So I think having this break the API is possibly more of a feature than a bug forcing anything to check and update. Although I do wonder if there's a way to make it work for stable updates, even if that's just trying to detect a clock change or something like that. And check for timers going too far backwards or forwards and somehow deal with that better than we currently do. There are a few high CPU bugs that I think are also caused by this when time jumps.

lathiat avatar Sep 17 '21 14:09 lathiat

Seems pretty much only the AvahiPoll APIs we have to worry about. Some poll implementations like avahi-glib (updated in the PR) mostly don't matter as we end up passing avahi-glib a differential time we calculate anyway. However others (like libevent) seem like they'll need to switch to using the monotonic version of their relevant timer API as we actually pass the gettimeofday value. So I think having this break the API is possibly more of a feature than a bug forcing anything to check and update. Although I do wonder if there's a way to make it work for stable updates, even if that's just trying to detect a clock change or something like that. And check for timers going too far backwards or forwards and somehow deal with that better than we currently do. There are a few high CPU bugs that I think are also caused by this when time jumps.

Looks like it's just libevent that's still using gettimeofday(); it looks like it's only using it to calculate a relative time. I'd prefer to avoid using some heuristic to detect "time jumps" while continuing to use gettimeofday(); it feels like that is just prone to strange edge cases over using the monotonic timer, which seems like the correct fix in this case.

JPEWdev avatar Sep 17 '21 14:09 JPEWdev

I agree it’s the correct fix for master.

lathiat avatar Sep 17 '21 15:09 lathiat

While I think it is correct to use monotonic time, I think there is relative high chance this change would break something.

pemensik avatar Nov 24 '22 16:11 pemensik

This PR seems to trigger a lot of warnings like

timeval.c: In function 'avahi_now':
timeval.c:86:5: warning: this use of "defined" may not be portable [-Wexpansion-to-defined]
   86 | #if USE_MONOTONIC
      |     ^~~~~~~~~~~~~
timeval.c:86:5: warning: this use of "defined" may not be portable [-Wexpansion-to-defined]
timeval.c:92:5: warning: this use of "defined" may not be portable [-Wexpansion-to-defined]
   92 | #if USE_MONOTONIC
      |     ^~~~~~~~~~~~~
...
dbus-protocol.c: In function ‘dbus_select_browser’:
dbus-protocol.c:1083:23: warning: passing argument 1 of ‘avahi_elapse_time’ from incompatible pointer type [-Wincompatible-pointer-types]
 1083 |     avahi_elapse_time(&tv, DEFAULT_START_DELAY_MS, 0);
      |                       ^~~
      |                       |
      |                       struct timeval *
In file included from ../avahi-common/watch.h:29,
                 from ../avahi-common/dbus-watch-glue.h:25,
                 from dbus-protocol.c:44:
../avahi-common/timeval.h:62:61: note: expected ‘struct AvahiTimeVal *’ but argument is of type ‘struct timeval *’
   62 | struct AvahiTimeVal *avahi_elapse_time(struct AvahiTimeVal *tv, unsigned ms, unsigned j);
      |                                        ~~~~~~~~~~~~~~~~~~~~~^~

and so on so if it's decided that it's the way to go I think it should be rebased on top of the master branch to trigger the CI. (once https://github.com/lathiat/avahi/pull/425 and https://github.com/lathiat/avahi/pull/422 get merged the CI should at least be able to compile-test it thoroughly).

evverx avatar Dec 09 '22 22:12 evverx

warning: this use of "defined" may not be portable [-Wexpansion-to-defined]

That should be fixable by changing

#define USE_MONOTONIC (defined(HAVE_CLOCK_GETTIME) && defined(_POSIX_MONOTONIC_CLOCK))

to

#if (defined(HAVE_CLOCK_GETTIME) && defined(_POSIX_MONOTONIC_CLOCK))
#define USE_MONOTONIC 1
#endif

mvduin avatar Dec 10 '22 08:12 mvduin

Ok. I'll rebase the PR and make the fixes on Monday. We've also been using this patch for years without issue, it would be really nice to have it upstream

JPEWdev avatar Dec 10 '22 14:12 JPEWdev

@JPEWdev I'm not sure the CI will be ready on Monday here but I think I can build this PR on various architectures using my fork in the meantime. At this point the CI just builds avahi on x86_64 without treating warnings as errors and runs a few unit tests. It isn't much but with 34d43d2245f11 merged those unit tests somewhat cover the code touching timers (https://coveralls.io/builds/54825736). They are pretty basic and time jumps aren't tested at all but that's better than nothing I think.

it would be really nice to have it upstream

Having read this thread and a couple of linked issues as far as I understand there seems to be a consensus that avahi should switch from gettimeofday one way or another (and I agree with that too). There are concerns that it can break something as far as I can see but I don't think it should prevent this PR from being resurrected in the meantime.

Just to clarify I'm basically going through open PRs to try to figure out how mergeable they are and whether they are still active or not and based on the fast replies I got here I have to say that this 5-year-old PR is much more alive than I thought. I think it should hopefully help to prioritize it a bit.

evverx avatar Dec 11 '22 04:12 evverx

Turns out this PR can be (somewhat) smoke-tested using the existing tests because without this patch avahi-client/client-test gets stuck when the time jumps backwards and resumes when it goes back. (I'm not sure whether it's possible to tweak the time like that on GHActions directly though).

It can't be used to figure out what can break though. I think one option would be to take a look at packages where AvahiPoll is used and judging by https://github.com/MusicPlayerDaemon/MPD/commit/7a6823dcdfc7056d3c0e93f31aea49a99a2fddd9 it appears at least there gettimeofday is used to schedule events from the past immediately to fix "broken avahi timeouts".

evverx avatar Dec 12 '22 06:12 evverx

Looks like the "expansion-to-defined" warnings are gone. But clang still complains about incompatible pointer types. As far as I can see they come from the code like 17a0aa98755bc20d and https://github.com/lathiat/avahi/pull/186 where new timevals were introduced after this PR had been opened:

dbus-protocol.c:1075:23: error: incompatible pointer types passing 'struct timeval *' to parameter of type 'struct AvahiTimeVal *' [-Werror,-Wincompatible-pointer-types]
    avahi_elapse_time(&tv, DEFAULT_START_DELAY_MS, 0);
                      ^~~
../avahi-common/timeval.h:62:61: note: passing argument to parameter 'tv' here
struct AvahiTimeVal *avahi_elapse_time(struct AvahiTimeVal *tv, unsigned ms, unsigned j);
                                                            ^
dbus-protocol.c:1081:65: error: incompatible pointer types passing 'struct timeval *' to parameter of type 'const struct AvahiTimeVal *' [-Werror,-Wincompatible-pointer-types]
            db->delay_timeout = poll_api->timeout_new(poll_api, &tv, GET_DBUS_DELAY_FUNC(DomainBrowserInfo, domain_browser), db);
                                                                ^~~
dbus-protocol.c:1088:67: error: incompatible pointer types passing 'struct timeval *' to parameter of type 'const struct AvahiTimeVal *' [-Werror,-Wincompatible-pointer-types]
            stbi->delay_timeout = poll_api->timeout_new(poll_api, &tv, GET_DBUS_DELAY_FUNC(ServiceTypeBrowserInfo, service_type_browser), stbi);
                                                                  ^~~
dbus-protocol.c:1095:66: error: incompatible pointer types passing 'struct timeval *' to parameter of type 'const struct AvahiTimeVal *' [-Werror,-Wincompatible-pointer-types]
            sbi->delay_timeout = poll_api->timeout_new(poll_api, &tv, GET_DBUS_DELAY_FUNC(ServiceBrowserInfo, service_browser), sbi);
                                                                 ^~~
dbus-protocol.c:1102:66: error: incompatible pointer types passing 'struct timeval *' to parameter of type 'const struct AvahiTimeVal *' [-Werror,-Wincompatible-pointer-types]
            sri->delay_timeout = poll_api->timeout_new(poll_api, &tv, GET_DBUS_DELAY_FUNC(AsyncServiceResolverInfo, service_resolver), sri);
                                                                 ^~~
dbus-protocol.c:1109:66: error: incompatible pointer types passing 'struct timeval *' to parameter of type 'const struct AvahiTimeVal *' [-Werror,-Wincompatible-pointer-types]
            hri->delay_timeout = poll_api->timeout_new(poll_api, &tv, GET_DBUS_DELAY_FUNC(AsyncHostNameResolverInfo, host_name_resolver), hri);
                                                                 ^~~
dbus-protocol.c:1116:66: error: incompatible pointer types passing 'struct timeval *' to parameter of type 'const struct AvahiTimeVal *' [-Werror,-Wincompatible-pointer-types]
            ari->delay_timeout = poll_api->timeout_new(poll_api, &tv, GET_DBUS_DELAY_FUNC(AsyncAddressResolverInfo, address_resolver), ari);
                                                                 ^~~
dbus-protocol.c:1123:66: error: incompatible pointer types passing 'struct timeval *' to parameter of type 'const struct AvahiTimeVal *' [-Werror,-Wincompatible-pointer-types]
            rbi->delay_timeout = poll_api->timeout_new(poll_api, &tv, GET_DBUS_DELAY_FUNC(RecordBrowserInfo, record_browser), rbi);
                                                                 ^~~
8 errors generated.
libevent-watch-test.c:67:23: error: incompatible pointer types passing 'struct timeval *' to parameter of type 'struct AvahiTimeVal *' [-Werror,-Wincompatible-pointer-types]
    avahi_elapse_time(&tv, 1000, 0);
                      ^~~
../avahi-common/timeval.h:62:61: note: passing argument to parameter 'tv' here
struct AvahiTimeVal *avahi_elapse_time(struct AvahiTimeVal *tv, unsigned ms, unsigned j);
                                                            ^
libevent-watch-test.c:68:28: error: incompatible pointer types passing 'struct timeval *' to parameter of type 'const struct AvahiTimeVal *' [-Werror,-Wincompatible-pointer-types]
    api->timeout_update(t, &tv);
                           ^~~
libevent-watch-test.c:86:23: error: incompatible pointer types passing 'struct timeval *' to parameter of type 'struct AvahiTimeVal *' [-Werror,-Wincompatible-pointer-types]
    avahi_elapse_time(&tv, 1000, 0);
                      ^~~
../avahi-common/timeval.h:62:61: note: passing argument to parameter 'tv' here
struct AvahiTimeVal *avahi_elapse_time(struct AvahiTimeVal *tv, unsigned ms, unsigned j);
                                                            ^
libevent-watch-test.c:87:27: libevent-watch.c:270:22: error: incompatible function pointer types assigning to 'AvahiTimeout *(*)(const AvahiPoll *, const struct AvahiTimeVal *, AvahiTimeoutCallback, void *)' (aka 'struct AvahiTimeout *(*)(const struct AvahiPoll *, const struct AvahiTimeVal *, void (*)(struct AvahiTimeout *, void *), void *)') from 'AvahiTimeout *(const AvahiPoll *, const struct timeval *, AvahiTimeoutCallback, void *)' (aka 'struct AvahiTimeout *(const struct AvahiPoll *, const struct timeval *, void (*)(struct AvahiTimeout *, void *), void *)') [-Werror,-Wincompatible-function-pointer-types]
        ep->api.timeout_new = timeout_new;
                            ^ ~~~~~~~~~~~
libevent-watch.c:272:25: error: incompatible function pointer types assigning to 'void (*)(AvahiTimeout *, const struct AvahiTimeVal *)' (aka 'void (*)(struct AvahiTimeout *, const struct AvahiTimeVal *)') from 'void (AvahiTimeout *, const struct timeval *)' (aka 'void (struct AvahiTimeout *, const struct timeval *)') [-Werror,-Wincompatible-function-pointer-types]
        ep->api.timeout_update = timeout_update;
                               ^ ~~~~~~~~~~~~~~
error: incompatible pointer types passing 'struct timeval *' to parameter of type 'const struct AvahiTimeVal *' [-Werror,-Wincompatible-pointer-types]
    api->timeout_new(api, &tv, wakeup, NULL);
                          ^~~
42 errors generated.

evverx avatar Dec 13 '22 06:12 evverx

Anyway, I'm not sure rebasing PRs on top of my fork, running the CI there and copy-pasting compiler warnings is optimal. I think the CI should be up and running here to make things easier for everybody. Hopefully it's going to happen relatively soon once various things like access to settings affecting the CI here (or in a separate GH organization if it's decided that it should be kept there) are sorted out.

evverx avatar Dec 13 '22 06:12 evverx

I'd be happy to resurrect it if it will actually be merged :)

JPEWdev avatar Apr 17 '23 16:04 JPEWdev

Too bad this PR seems abandoned, because it solves a real issue

It isn't abandoned. It's just a large PR and it isn't clear how it can affect whatever depends on that API. I guess it should be possible to ask distros to build avahi with this PR included along with the reverse dependencies to see what breaks but it's based on a bold assumption that the reverse dependencies come with any tests.

Other than that it plainly fails to compile with clang:

  CC       libevent_watch_test-libevent-watch.o
libevent-watch.c:270:22: error: incompatible function pointer types assigning to 'AvahiTimeout *(*)(const AvahiPoll *, const struct AvahiTimeVal *, AvahiTimeoutCallback, void *)' (aka 'struct AvahiTimeout *(*)(const struct AvahiPoll *, const struct AvahiTimeVal *, void (*)(struct AvahiTimeout *, void *), void *)') from 'AvahiTimeout *(const AvahiPoll *, const struct timeval *, AvahiTimeoutCallback, void *)' (aka 'struct AvahiTimeout *(const struct AvahiPoll *, const struct timeval *, void (*)(struct AvahiTimeout *, void *), void *)') [-Wincompatible-function-pointer-types]
        ep->api.timeout_new = timeout_new;
                            ^ ~~~~~~~~~~~
libevent-watch.c:272:25: error: incompatible function pointer types assigning to 'void (*)(AvahiTimeout *, const struct AvahiTimeVal *)' (aka 'void (*)(struct AvahiTimeout *, const struct AvahiTimeVal *)') from 'void (AvahiTimeout *, const struct timeval *)' (aka 'void (struct AvahiTimeout *, const struct timeval *)') [-Wincompatible-function-pointer-types]
        ep->api.timeout_update = timeout_update;
                               ^ ~~~~~~~~~~~~~~
2 errors generated.
make[2]: *** [Makefile:858: libevent_watch_test-libevent-watch.o] Error 1
make[2]: Leaving directory '/home/vagrant/avahi/avahi-libevent'
make[1]: *** [Makefile:832: all-recursive] Error 1
make[1]: Leaving directory '/home/vagrant/avahi'
make: *** [Makefile:742: all] Error 2

and given that avahi is built in all sorts environments with all sorts of compilers it's just not something that can be merged upstream in its current form.

evverx avatar Nov 06 '23 18:11 evverx

I've updated this to fix the compiler errors.

The change in types should prevent incompatibility when using the timers; because a new type is used you have to use the Avahi API to manipulate timers when dealing with the Avahi API instead of just using struct timeval and this will correctly use the monotonic timer when possible. The down side is that that is technically a breaking API change; specifically the AvahiPoll API. In reality I suspect there won't be much actual breakage because most of the AvahiPoll objects are created from Avahi libraries themselves, and thus do the correct thing. It would only actually break if someone was hand rolling their own AvahiPoll backend instead of using a provided one.

JPEWdev avatar Nov 06 '23 19:11 JPEWdev

The down side is that that is technically a breaking API change

Agreed. I think it actually exposed a bug in the CI where the system and newly built libraries got mixed up. avahi failed to start with

avahi-daemon[23549]: avahi-daemon: starting up: symbol lookup error: /lib/libavahi-core.so.7: undefined symbol: avahi_now

evverx avatar Nov 06 '23 19:11 evverx

I can confirm that clang no longer fails to build this PR on my machine.

Given that the library should be bumped I guess it should be moved to the next milestone. There are already a couple changes breaking builds (and only Debian backported one of those patches as far as I know). I'm not sure the next release should introduce more changes like that. Hopefully the next milestone shouldn't be far from 0.9.

evverx avatar Nov 06 '23 20:11 evverx

Before I forget one tv hasn't been converted:

$ git grep 'timeval tv' | grep -v timeval.c
avahi-daemon/dbus-protocol.c:    struct timeval tv;

The typo mentioned in https://github.com/lathiat/avahi/pull/96#discussion_r1170016652 should be fixed.

I'd probably need to set up a CI build where USE_MONOTONIC is set 0 unconditionally to make sure it compiles and works too. I guess it's easy to neglect that part with no tests.

(it can be done once it's closer to its milestone though)

evverx avatar Nov 06 '23 21:11 evverx

Before I forget one tv hasn't been converted:

$ git grep 'timeval tv' | grep -v timeval.c
avahi-daemon/dbus-protocol.c:    struct timeval tv;

The typo mentioned in #96 (comment) should be fixed.

Typo was already fixed

I'd probably need to set up a CI build where USE_MONOTONIC is set 0 unconditionally to make sure it compiles and works too. I guess it's easy to neglect that part with no tests.

(it can be done once it's closer to its milestone though)

JPEWdev avatar Nov 06 '23 21:11 JPEWdev

There were two typos. Looking at

diff --git a/avahi-common/timeval.c b/avahi-common/timeval.c
index 1918f40..382c650 100644
--- a/avahi-common/timeval.c
+++ b/avahi-common/timeval.c
@@ -34,7 +34,7 @@
 #include "timeval.h"
 
 #if defined(HAVE_CLOCK_GETTIME) && defined(_POSIX_MONOTONIC_CLOCK)
-#define USE_MONTONIC 1
+#define USE_MONOTONIC 1
 #else
 #define USE_MONTONIC 0
 #endif

it seems one of them was fixed. It could be I look at the wrong patch though

evverx avatar Nov 06 '23 21:11 evverx

There were two typos. Looking at

diff --git a/avahi-common/timeval.c b/avahi-common/timeval.c
index 1918f40..382c650 100644
--- a/avahi-common/timeval.c
+++ b/avahi-common/timeval.c
@@ -34,7 +34,7 @@
 #include "timeval.h"
 
 #if defined(HAVE_CLOCK_GETTIME) && defined(_POSIX_MONOTONIC_CLOCK)
-#define USE_MONTONIC 1
+#define USE_MONOTONIC 1
 #else
 #define USE_MONTONIC 0
 #endif

it seems one of them was fixed. It could be I look at the wrong patch though

Oops. You are correct

JPEWdev avatar Nov 06 '23 21:11 JPEWdev