pim6sd icon indicating copy to clipboard operation
pim6sd copied to clipboard

Timer accuracy is way off

Open mweinelt opened this issue 6 years ago • 4 comments
trafficstars

The accuracy with which timers run is way off. There are times when the seconds pass quite normally and then the timers are stuck for minutes at a time.

This leads to missing MLD queries and therefore missing group joins.

This happens with a granularity of 5. Setting a granularity of 1 leads to much more reasonable timer behaviour.

The hardware this happens on is x86_64 (APU2c4) with Linux 4.19.37.

mweinelt avatar May 18 '19 03:05 mweinelt

On a 32bit armhf/ARMv7 device (Odroid U3) with a 4.19.28 kernel as provided by Debian Sid both "granularity 1" and "granularity 5" seemed to transmit MLD queries in regular intervals. We were not able to reproduce the issue on my side yet.

T-X avatar May 18 '19 03:05 T-X

This is serious, the default is 5 seconds ... I've seen similar issues in pimd and mrouted, which use the same pattern for their timer handling. I'll look into it!

Update: pimd has the same 5 second granularity, but it's hard-coded and cannot be configured like in pim6sd. The code seems to be similar, if not identical. I'll add some debug messages from pimd to increase the observability a bit.

troglobit avatar May 18 '19 10:05 troglobit

OK, there's one case when pim6sd will wait "forever", i.e. call select() with NULL as timeout. That's when there is no timers to service, typically when there are more than one router on a link and ours is not the elected querier. In that case pim6sd will wait for the next incoming message before it wakes up.

There should at least be a router/querier timeout, in case the neighboring (elected) querier stops sending queries. So that seems to be a bug/omission.

Update: Nah, my bad. There's a MLD6_OTHER_QUERIER_PRESENT_INTERVAL timer to handle that, default 255 sec.

troglobit avatar May 18 '19 11:05 troglobit

I recently did a huge refactor of mrouted where one of the targets was replacing the inherently broken timer implementation with pev, which is a generic UNIX event library.

However, the refactor benefited from the existing regression test suite, which we don't have from pim6sd yet. Any refactor in this project would likely need such a test suite, because verifying any changes made manually is a tedious and error prone task.

troglobit avatar Dec 13 '24 11:12 troglobit