netbird icon indicating copy to clipboard operation
netbird copied to clipboard

fix(router): nft tables limit number of peers source

Open nhenneaux opened this issue 3 months ago • 9 comments

Describe your changes

Fixing following errors occurring when too much peers as a source by batching nft rule add by 200 sources 3240 peers does not lead to error while 3280 fails.

2025-11-24T15:56:59.2487119Z ERRO client/internal/acl/manager.go:75: Failed to apply route ACLs: 82 errors occurred:
	* add route rule: add route rule: apply source: source: create or get ipset: failed to add for key nb-8aecb703: flush error: conn.Receive: netlink receive: no such file or directory
	* add route rule: add route rule: apply source: source: create or get ipset: failed to add for key nb-8aecb703: flush error: conn.Receive: netlink receive: no such file or directory
	(...)

OS rhel9.7, kernel 5.14.0-570.60.1.el9_6.x86_64, nftables v1.0.9 (Old Doc Yak https://github.com/netbirdio/netbird/pull/3).

Issue ticket number and link

Stack

Checklist

  • [x] Is it a bug fix
  • [ ] Is a typo/documentation fix
  • [ ] Is a feature enhancement
  • [ ] It is a refactor
  • [ ] Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

  • [ ] I added/updated documentation for this change
  • [x] Documentation is not needed for this change (explain why) Bug fix

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

https://github.com/netbirdio/docs/pull/__

Summary by CodeRabbit

  • Refactor

    • Initialization now applies very large prefix lists in batched chunks for more reliable handling; per-batch logging and improved error messages report progress and failures.
  • Tests

    • Added compatibility tests for a very large (6k+) prefix set and for an empty prefix set; the large-scale test was accidentally duplicated.

✏️ Tip: You can customize this high-level summary in your review settings.

nhenneaux avatar Nov 24 '25 17:11 nhenneaux

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Nov 24 '25 17:11 CLAassistant

Walkthrough

Replaces single bulk ipset initialization with batched adds in createIpSet: add up to maxElements initially, flush, then add remaining prefixes in sublists (≤ maxElements) with per-batch flushes and logs. Adds two tests for large (6,375) and empty prefix sets; tests are duplicated in the file.

Changes

Cohort / File(s) Summary
IpSet creation batching
client/firewall/nftables/router_linux.go
Introduces maxPrefixesSet (1500) and computes maxElements = maxPrefixesSet * 2. Changes createIpSet to add an initial slice of elements up to maxElements, flush, then iterate remaining elements in sublists of ≤ maxElements calling SetAddElements, flushing and logging after each batch; errors and logs include sublist counts and contextual info.
Large-prefix & empty-prefix tests (duplicated)
client/firewall/nftables/manager_linux_test.go
Adds TestNftablesManagerCompatibilityWithIptablesFor6kPrefixes (builds 6,375 prefixes) and TestNftablesManagerCompatibilityWithIptablesForEmptyPrefixes; each test skips when nftables/iptables-save absent, creates/initializes manager, applies AddRouteFiltering, verifies iptables-save output, and cleans up. The test functions are duplicated within the file.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant C as createIpSet
    participant K as kernel ipset
    participant L as logger

    rect rgba(70,130,180,0.06)
    C->>K: Create set (metadata)
    end

    rect rgba(60,179,113,0.06)
    C->>K: AddElements(initialElements ≤ maxElements)
    K-->>C: OK / error
    C->>K: Flush set
    C-->>L: Log initial creation (count)
    end

    alt remaining elements exist
        loop for each sublist (≤ maxElements)
            rect rgba(255,215,0,0.04)
            C->>K: SetAddElements(sublist)
            K-->>C: OK / error (error includes sublist count)
            C->>K: Flush set
            C-->>L: Log sublist progress (count)
            end
        end
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Verify batching math and boundary conditions (maxPrefixesSet → maxElements) and off-by-one behavior.
  • Confirm flush ordering and impact of partial failures on ipset state and rollback considerations.
  • Inspect error wrapping/messages for useful context (which batch failed, prefix counts).
  • Review duplicated tests for accidental duplication and environment assumptions (nftables + iptables-save).

Suggested reviewers

  • mlsmaycon
  • pappz

Poem

🐰 I sorted prefixes, chunk by chunk,
A first small heap, then batches in a bunch.
I flushed each batch and logged with pride,
If one batch trips, I point the side.
Hoppy nets — tidy routes worldwide.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: fixing nftables peer source limitations through batching, which is the primary purpose of this bug fix.
Description check ✅ Passed The description provides the required information: issue context with specific error details, reproduction steps, environment details (RHEL 9.7, nftables v1.0.9), the batching solution (200 sources), and properly completes the template with bug fix checkbox and documentation justification.
✨ Finishing touches
  • [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1bff5cfee4528853c3c60c7f004b69f6855cdff3 and b1243d1736362ee02556f49c0f2a2204d33c381d.

📒 Files selected for processing (1)
  • client/firewall/nftables/router_linux.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • client/firewall/nftables/router_linux.go

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Nov 24 '25 17:11 coderabbitai[bot]

great stuff @nhenneaux, thank you! The team is looking into that

braginini avatar Nov 25 '25 09:11 braginini

I reproduced the issue with failing at 3277 prefixes on nftables v1.0.9 with Ubuntu 24.04.3 LTS,  6.14.0-35-generic #35~24.04.1-Ubuntu, 3276 prefixes work. It reproduces with TestNftablesManagerCompatibilityWithIptablesFor6kPrefixes and it is fixed with new batching mechanism. I still would like to confirm whether the CodeRabbit comment is relevant or not on the AND/OR semantic between ipset

Initial issue is on rhel9.7, kernel 5.14.0-570.60.1.el9_6.x86_64, nftables v1.0.9 (Old Doc Yak #3).

nhenneaux avatar Nov 25 '25 17:11 nhenneaux

I addressed the issue on multiple ipset not working with OR semantic (I haven't found resources confirming CodeRabbit but it makes sense) It also addresses the issue with other IpSet with too much elements. Limit seems to be 1638 prefixes (3276 elements), it fails above.

nhenneaux avatar Nov 26 '25 13:11 nhenneaux

@coderabbitai review

nhenneaux avatar Nov 26 '25 13:11 nhenneaux

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai[bot] avatar Nov 26 '25 13:11 coderabbitai[bot]

Hi @braginini

We have tested the fix in our environment and it seems to work as expected without the error preventing acl with huge number of peers. Could you check this to integrate the fix in the upstream? :pray:

nhenneaux avatar Dec 03 '25 12:12 nhenneaux