pim6sd
pim6sd copied to clipboard
Timer accuracy is way off
The accuracy with which timers run is way off. There are times when the seconds pass quite normally and then the timers are stuck for minutes at a time.
This leads to missing MLD queries and therefore missing group joins.
This happens with a granularity of 5. Setting a granularity of 1 leads to much more reasonable timer behaviour.
The hardware this happens on is x86_64 (APU2c4) with Linux 4.19.37.
On a 32bit armhf/ARMv7 device (Odroid U3) with a 4.19.28 kernel as provided by Debian Sid both "granularity 1" and "granularity 5" seemed to transmit MLD queries in regular intervals. We were not able to reproduce the issue on my side yet.
This is serious, the default is 5 seconds ... I've seen similar issues in pimd and mrouted, which use the same pattern for their timer handling. I'll look into it!
Update: pimd has the same 5 second granularity, but it's hard-coded and cannot be configured like in pim6sd. The code seems to be similar, if not identical. I'll add some debug messages from pimd to increase the observability a bit.
OK, there's one case when pim6sd will wait "forever", i.e. call select() with NULL as timeout. That's when there is no timers to service, typically when there are more than one router on a link and ours is not the elected querier. In that case pim6sd will wait for the next incoming message before it wakes up.
There should at least be a router/querier timeout, in case the neighboring (elected) querier stops sending queries. So that seems to be a bug/omission.
Update: Nah, my bad. There's a MLD6_OTHER_QUERIER_PRESENT_INTERVAL timer to handle that, default 255 sec.
I recently did a huge refactor of mrouted where one of the targets was replacing the inherently broken timer implementation with pev, which is a generic UNIX event library.
However, the refactor benefited from the existing regression test suite, which we don't have from pim6sd yet. Any refactor in this project would likely need such a test suite, because verifying any changes made manually is a tedious and error prone task.