weave icon indicating copy to clipboard operation
weave copied to clipboard

[WIP] Workaround to fix Kernel bug related ipset entry deletion

Open murali-reddy opened this issue 7 years ago • 8 comments
trafficstars

if the kernel version is in affected range of Kernels, then resync the entries to expected set of entries.

https://bugzilla.netfilter.org/show_bug.cgi?id=1119

ipset (v6.30, v6.29, v6.25.1, but not v6.21.1) hash code is sometimes evicting (or bumping) as a 
side-effect other entries in the set upon entry deletion (ipset del).  The symptom of this is that you 
get the error "ipset v6.30: Element cannot be deleted from the set: it's not added" when deleting an 
entry that has not yet been legitimately deleted. The problem happens with a large number of entries; 
in some cases I've seen it with under 700 entries but typically it takes over 1,000 entries.  I've seen 
one, two, even three or four entries evicted on a deletion.

Fixes #3296 failed: ipset v6.32: Element cannot be deleted from the set: it's not added

murali-reddy avatar Aug 10 '18 08:08 murali-reddy

Kernel versions are a poor indicator as to what functionality/bugs are present, since some distros aggressively backport changes.

rade avatar Aug 10 '18 09:08 rade

Kernel versions are a poor indicator as to what functionality/bugs are present, since some distros aggressively backport changes.

Ok, i will check if there is more reliable way. Nevertheless adding this workaround will add little overhead if Kernel is already has the fix and will not cause any side-affects.

murali-reddy avatar Aug 10 '18 10:08 murali-reddy

If I read it right, you are triggering off the error message we saw in #3296, but the bug will hit earlier. We need to check after every delete - maybe not straight after, maybe have a timer so we batch up multiple deletes to the same set.

bboreham avatar Aug 10 '18 11:08 bboreham

Is it still WIP (as the tittle suggests) or is it ready for a review?

brb avatar Aug 14 '18 12:08 brb

@brb Key logic to detect if the Kernel has the bug is not reliable. As pointed out Kernel version is not the best way to figure if it has ipset bug. Do you have any better way to reliably figure if Kernel is affected with ipset issue?

murali-reddy avatar Aug 14 '18 14:08 murali-reddy

I wouldn't bother with checking the kernel version, and I would enable the safe-delete for all.

Also, as we already bookkeeping all ipset elements in NPC, wouldn't it better to compare against the ones in NPC instead of ipset list <...>?

brb avatar Aug 17 '18 09:08 brb

I wouldn't bother with checking the kernel version, and I would enable the safe-delete for all.

@brb agree, its little bit of overhead but I can not think of any other way to find if the kernel is affected with the bug

Also, as we already bookkeeping all ipset elements in NPC, wouldn't it better to compare against the ones in NPC instead of ipset list <...>?

Could you point me where bookkeeping of ipset elements is done? My intent is since its helper function, we don't know who the consumers are if they do any bookkeeping etc. Ideally would like to contain the changes with in the helper function without taking any help from the consumer of this utility library.

murali-reddy avatar Aug 17 '18 11:08 murali-reddy

E.g. https://github.com/weaveworks/weave/blob/master/npc/selector.go#L170. I'm suggesting to always sync with the tracked by NPC ipsets as these are the source of truth.

brb avatar Aug 17 '18 12:08 brb