FlushSet is unreliable, occasionally elements remain in the set
I'm running into occasional errors when working with sets / maps. I narrowed this down to the FlushSet function, which sometimes doesn't actually flush the set, i.e. the set sometimes is not empty after FlushSet.
It usually works as expected, but in 5-10% of the cases, the existing elements remain in the set.
I observed this behavior both with v0.3.0, as well as with the current state of the master branch. Kernel information:
❯ uname -a
Linux linode-sg 5.15.0-131-generic #141-Ubuntu SMP Fri Jan 10 21:18:28 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Here's a minimal example to reproduce the bug. It does the following:
- It creates a table and a set.
- Adds some elements to the set.
- Flushes the set using
FlushSet. - Checks that there are 0 elements remaining.
package loadbalancer
import (
"testing"
"github.com/google/nftables"
"github.com/google/nftables/binaryutil"
"github.com/stretchr/testify/require"
)
func TestFlushSet(t *testing.T) {
conn := &nftables.Conn{}
table := &nftables.Table{
Name: "test_table",
Family: nftables.TableFamilyIPv4,
}
conn.AddTable(table)
t.Cleanup(func() {
conn.DelTable(table)
require.NoError(t, conn.Flush(), "failed to delete table")
})
set := &nftables.Set{
Name: "test_set",
Table: table,
IsMap: false,
KeyType: nftables.TypeInetService, // 2-byte key (e.g., ports)
}
require.NoError(t, conn.AddSet(set, nil), "failed to add set")
elements := []nftables.SetElement{
{Key: binaryutil.BigEndian.PutUint16(80)}, // Port 80
{Key: binaryutil.BigEndian.PutUint16(443)}, // Port 443
{Key: binaryutil.BigEndian.PutUint16(8080)}, // Port 8080
}
require.NoError(t, conn.SetAddElements(set, elements), "failed to add elements")
require.NoError(t, conn.Flush(), "failed to flush added elements")
conn.FlushSet(set)
require.NoError(t, conn.Flush(), "failed to flush the flush")
// check that the set is empty
elems, err := conn.GetSetElements(set)
require.NoError(t, err, "failed to get set elements after flush")
require.Empty(t, elems, "set not empty after flush")
}
cc @twitchyliquid64 / @twitchy-jsonp who originally contributed set support back in 2019
I had a look into this but wasn't able to reproduce the issue. I am running kernel 6.12.10-76061203-generic but I also tried Ubuntu 22 (5.15.0-143-generic) on Vagrant.
I recently came across this discussion so I am wondering whether it's related ?
To test this theory I wrote a test that creates a large set and flushes it repeatedly (100 times in total). The test took about 22 minutes to run on my machine, but I still wasn't able to trigger a failure.