packages
packages copied to clipboard
ebtables-tiny: Fix lockfile function and reduce wait on lock
During investigation of an issue with ebtables we discovered that the locking mechanism in ebtables-tiny did not work as expected. These commits should fix this issue.
These commits were part of the Freifunk Braunschweig parker -releases for about an year now. We did not discover any negative impact with these changes.
Please also advise: Should I open a separate PR on the main gluon repo once this is merged? Or do you pick up changes from here automagically?
Waiting until the lock releases indefinitely is the intended behavior. The comment on the function is incorrect, not the code (the same incorrect comment exists in "full" ebtables). I believe the comment was correct at some point in the past, but the code was improved without updating the comment.
Waiting until the lock releases indefinitely is the intended behavior. The comment on the function is incorrect, not the code (the same incorrect comment exists in "full" ebtables). I believe the comment was correct at some point in the past, but the code was improved without updating the comment.
The original code returned from lock_file with -1 if another process had the lock: https://git.netfilter.org/ebtables/commit/?id=8b5594d7c21f3056c8c194aea1f1519c006aeaee
When it was changed to use flock() it never returned as long another process had the lock, making the retry loop with sleep in ebt_get_kernel_table useless. No "Trying to obtain lock" is ever printed, making it hard to diagnose this case from logs.
https://github.com/freifunk-gluon/gluon/blob/main/package/gluon-radv-filterd/src/gluon-radv-filterd.c also has code to handle timeouts of ebtables-tiny. We're not sure how this is supposed to interact. :)
In theory, ebtables should never run long enough for any of this to matter. If the lock is ever held longer than a few milliseconds, that seems like a bug to me, not the code that waits for the lock to release.
In theory, ebtables should never run long enough for any of this to matter. If the lock is ever held longer than a few milliseconds, that seems like a bug to me, not the code that waits for the lock to release.
Agreed. We added this change because we saw hanging ebtables-tiny processes and wanted to investigate where they got stuck (e.g on the lock or somewhere else). Without these log messages in a (working) retry loop, there is no clear indication which of those ebtables-tiny processes is hanging where.
@neocturne @jluebbe how to proceed here? :)
@neocturne and me agreed on IRC to close this PR. it's a kernel bug and this PR isn't the right solution to it - and hopefully there will be a totally different solution in the future.
@rotanid Do you have some details on the kernel bug?
Basically, every ebtables call into the kernel should return quickly. Waiting indefinitely for other ebtables processes to exit is the correct and intended behavior (as "indefinitely" should always be <1s here) - but it is also unclear to me how the retry implemented in this PR even fixes anything, even if there is a kernel bug, unless it somehow breaks the locking altogether.